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22438, 23553, 25278, AND 26212 NOVEL HUMAN SULFATASES 

FIELD OF THE INVENTION 

The present invention relates to newly identified human sulfatases. In particular, 
the invention relates to sulfatase polypeptides and polynucleotides, methods of detecting 
the sulfatase polypeptides and polynucleotides, and methods of diagnosing and treating 
sulfatase-related disorders. Also provided are vectors, host cells, and recombinant 
5 methods for making and using the novel molecules. 

BACKGROUND OF THE INVENTION 

The biology and functions of the reversible sulfation pathway catalyzed by 
1 0 human sulfotransferases and sulfatases has been reviewed by Coughtrie et al. (Chemi co- 
Biological Literactiom 109: 3-27 (1998)). This review, summarized below, focuses on 
the sulfation of small molecules carried out by cytosohc sulfotransferases rather than the 
sulfation of macromolecules and lipids catalyzed by membrane-associated 
sulfotransferases. 

1 5 Sulfation fimctions in the metaboUsm of xenobiotic compounds, steroid 

biosjoithesis, and modulating the biological activity and inactivation and elimination of 
potent endogenous chemicals such as thyroid hormones, steroids and catechols. This 
pathway is reversible, comprising the sulfotransferase enzymes that cause the sulfation 
and the sulfatases that hydrolyze the sulfate esters formed by the action of the 

20 sulfotransferases. Accordingly, tiie interplay between these famiUes regulates the 
availabiUty and biological activity of xenobiotic and endogenous chemicals. The 
sulfatases, including the arylsulfatases (ARS), are located in lysosomes or endoplasmic 
reticulum. 

The presence of sulfated components depends upon tlie availability of key 
25 members of the sulfate pafliway, i.e., substrate and activated sulfate donor molecule (co- 
substrate) and the balance between sulfation and sulfate conjugate hydrolysis that 
depends upon the activity and localization of the sulfotransferases and tiie sulfatases. 

-1 - 
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Essentially, divalent sulfate is converted to adenosine 5' phosphosulfate (PAPS) by 
hydrolysis of ATP. This compound is in turn converted to 3' phosphoadenosine 5' 
phosphosulfate by hydrolysis of ATP to ADP. This compound is then converted to 
adenosine 3' 5' biphosphate concurrently with the formation of 4-nitrophenolsulfate from 
5 4-nitrophenol. An ARS would then cleave the monovalent sialfate from tlie 4- 

nitrophenolsulfate to produce the original 4-mtrophenol. This forms the basis for the 
sulfation system in humans. Over- or under-production of any of these key molecules 
can result in sulfate-related disorders. For example, the brachymoiphic mouse has a 
connective tissue disorder that resxilts from a defect in PAPS formation that causes 

1 0 undersulfated cartilage proteoglycans. 

ARS enzymes and their genes have been associated with specific genetic 
diseases. ARS A is located in the lysosomes and removes sulfate from sulfated 
glycoUpids. A deficiency of ARSA has been associated with metachromatic 
leukodystrophy and multiple sulfatase deficiency (MSD). ARSB is located in lysosomes 

15 and has, as an endogenous substrate, dermatan sulfate and chondrotin sulfate. A 

deficiency of ARSB is associated with Maroteaux-Lamy syndrome and MSD. ARSC is 
located in the endoplasmic reticulum and has, as its endogenous substrate, cholesterol 
sulfate and steroid sulfates. A deficiency of ARSC is associated with X-linked 
ichthyosis and MSD. ARSD may be associated with MSD. ARSE has been associated 

20 with chondrodysplasia punctata and MSD. ARSF may be associated with MSD. ARSC 
hydrolyses sulfate esters on a wide range of steroids and cholesterol. ARSs also 
hydrolyse sulfate conjugates of xenobiotics. 

MSD results from an inability to perform a co- or post-translational modification 
of a cysteine residue to serine semialdehyde (2-oxo-3-propionic acid). This residue is 

25 conserved in all eukaryotic sulfatases described by Coughtrie et al ARSC may have a 
very broad specificity, extending to iodothyronine sulfates and a number of sulfate 
conjugates of xenobiotic phenols. 

The kinetic and catalytic properties of ARS enzymes in isolation, important for 
understanding substrate specificity and the physical and chemical properties of enzymes 

30 and substrates that allow substrate preference, have been characterized recently based on 
recombinant enzyme systems. For the expression of the human sulfotransferases, COS 
and V79 cells have been used. Coughtrie et al have constructed and characterized V79 
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cell lines stably expressing ARSA, ARSB, and ARSC. These cell lines exhibited the 
expected substrate preferences of the three enzymes among the substrates 4-nitrocatechol 
sulfate, estrone sulfate, and dehydroepiandrosterone sulfate(DHEAS). 

The siilfation of small molecules can be broadly divided into the areas of 
5 chemical defense, hormone biosyntliesis, and bioactivation. It was originally viewed 
that sulfation protected against the toxic effects of xenobiotics in that sulfate conjugates 
are more readily excreted in urine or bile and generally exhibit reduced 
pharmacological^iological activity relative to the parent compound. Many dmgs and 
other xenobiotics are conjugated with sulfate. Many phenolic metabolites of the 
10 cytochrome P450 mono-oxygenase system are excreted as sulfate conjugates. 

Further, potent endogenoxis chemicals, such as steroids and catecholamines are 
found at high levels as circulating sulfate conjugates. For example, greater than 90% of 
circulating dopamine exists as the sulfated form. Sulfation is also suggested to play a 
role in the inactivation of potent steroids such as estrogens and androgens. Accordingly, 
1 5 sulfation is important in metaboUsm and homeostasis of such compounds in himians. 

DHEAS is the major circulating steroid in humans and estrone sulfate is tlie 
major estrogen. These chemicals act as precursors of estrogens and androgens. 
Extremely large quantities of such steroids or estrogens may occur during various stages 
of development, such as pregnancy. Estrone svilfate is a precursor for P-estradiol 
20 synthesis. In breast cancer cells it is hydrolysed by steroid sulfatase (ARSC) to estrone 
which is then converted to p-estradiol by action of another en2yme. Accordingly, ARSC 
is important for maintmning active estrogen. It is thus an important therapeutic target for 
the treatment of breast cancer. 

Cholesterol sulfate, synthesized in the skin epidermis, may have a role in 
25 keratinoc3d;e differentiation. Accordingly, hydrolysis of cholesterol sulfate by steroid 

sulfatase may be important in skin fomaation and differentiation. This is the major organ 
affected in X-luiked ichthyosis caused by mutations in ARSC. 

Although sulfation may widely serve to detoxify potent compounds, some sulfate 
conjugates are more biologically active than the corresponding parent compound. 
30 Minoxidil and cicletanine are activated upon sulfation. Further, an inhibitor of ARSC 
was shown to potentiate the memory enhancing effect of DHEAS. This suggests a role 
for sulfates and sulfation in the central nervous system. 
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An important example of bioactivation by means of sulfation, however, occurs 
with dietary and environmental mutagens and carcinogens. For a large number of these, 
sulfation is the temiinal step in the pathway to metabolic activation. Examples of such 
chemicals include aromatic amines (including heterocyclic amines) and ben2ylic 
5 alchohols of chemicals such as polycyclic aromatic hydrocarbons, safrole, and estragole. 

The sulfatase gene family has been reviewed in Parenti et al {Current Opinion in 
Genetics and Development 7:386-391 (1997)), summarized below. 

Hie sulfatase family of enzymes is functionally and structurally similar. 
Nevertheless, these enzymes catalyze the hydrolysis of sulfate ester bonds jfrom a wide 
10 variety of substrates ranging from complex molecules such as glycosan^noglycans and 
sulfolipids to steroid sulfates (see also Coughtrie et aL, above). Several human genetic 
disorders result from the accumulation of intermediate sulfate compounds that result 
from a deficiency of single or multiple sulfatase activities. A subset of sulfatase, ARS, is 
characterized by the ability to hydrolyze sulfate esters of chromogenic or fluorogenic 
15 aromatic compounds such as ^-nitrocatechol sulfate and 4-methylumbelliferyl sulfate. 
Desulfation is required to degrade glycosaminoglycans, heparan sulfate, chondroitin 
sulfate and dermatan sulfate and sulfolipids. Steroid sulfatase differs from other 
members of the family vsdth respect to subcellular localization. It is localized in the 
microsomes rather than in lysosomes. Further, ARSD, ARSE, and ARSF are also non- 
20 lysosomal, being localized in the endoplasmic reticulum or Golgi compartment. 

The natural substrate of ARSA is cerebroside sulfate. Associated diseases are 
MLDandMSD. The natural substrate of ARSB is dermatan sulfate. The disease 
associated with this enzyme is MPSVI and MSD. The natural substrate of ARSC/STS is 
sulfated steroids. Diseases associated with this enzyme are XL! and MSD. The natural 
25 substrates of ARSD-F are unknown. The natural substrates of iduronate-2-sulfate 

sulfatase (IDS) are dermatan sulfate and herparan sulfate. Diseases associated with this 
enzyme are MPSII and MSD. The natural substrate of galactose 6-sulfatase is keratan 
sulfate and chondioitin 6-sulfate. Diseases associated with this enzyme include 
MPSIVA and MSD. The natural substrate of glucosamine-6-sulfatase is heparan sulfate 
30 and keratan sulfate. A disease associated with this enzyme is MPSIIID and MSD. The 
natural substrate of glucuronate-2-sulfatase is heparan sulfate. The natural substrate of 
glucosamine-3-sulfatase is heparan sulfate. 

-4- 
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Sulfatases are activated through conversion of a cysteine residue as described 
above. The conversion is required for catalytic activity and is defective in MSD. It is 
likely that all sulfatases undergo the same modification. The substitution of this cysteine 
was shown to destroy the enzymatic activity of N-acetyl galactosamine-4-sulfatase 
5 (ARSB). It has been shown that the modified residue and a metal ion are located at the 
base of a substrate binding pocket. 

Nine human sulfatase genes are known and murine rat, goat, or avian orthologs 
for some of these have been identified. A high degree of similarity occurs particularly in 
the amino terminal region which contains accordingly a potential consensus sulfatase 
10 signature. 

Sulfatases, as discussed above, are associated with human disease. Most 
sulfatase deficiencies cause lysosomal storage disorders. The mucopolysaccharidoses 
contain various associations of mental retardation, facial dysmorphisms, skeletal 
deformities, hepatosplenomegaly, and deformities of soft tissues caused by deficiencies 

15 of sulfatases acting on glycosaniinoglycans. In metachromatic leukodystrophy, a 
deficiency of ARSA causes the storage of sulfolipids in the central and peripheral 
nervous systems, leading to neurologic deterioration. X-linked icythyosis is caused by 
STS deficiency leading to increased cholesterol sulfate levels. MSD, a disorder in which 
all sulfatase activities are simultaneously defective, was shown to result firom a defect in 

20 the CO- or post-translational processing of sulfatases. 

Accordingly, sulfatases are a major target for drug action and development. 
Therefore, it is valuable to the field of pharmaceutical development to identify and 
characterize previously unknown sulfatases. The present invention advances the state of 
the art by providing previously xmidentified human sulfatases. 

25 

SUMMARY OF THE INVENTION 

Novel stilfatase nucleotide sequences, and tlie deduced sulfatase polypeptides 
are described herein. Accordingly, the invention provides isolated sulfatase nucleic acid 
30 molecules having the sequences shown in SEQ ID NOS:2, 4, 6, and 8 or in the cDNA 

deposited with ATCC as Patent Deposit Number , PTA-1639, PtA-1846, or 

, respectively ("the deposited cDNA"), and variants and fragments thereof 

-5- 
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It is also an object of the invention to provide nucleic acid molecules encoding 
tlie sulfatase polypeptides, and variants and fragments thereof. Such nucleic acid 
molecules are useful as targets and reagents in sulfatase expression assays, are applicable 
to treatment and diagnosis of sulfatase-related disorders and are usefld for producing 
5 novel sulfatase polypeptides by recombinant methods. 

The invention thus further provides nucleic acid constructs comprising tlie 
nucleic acid molecules described herein. In a preferred embodiment, the nucleic acid 
molecules of tiie invention are operatively linked to a regulatory sequence. The 
invention also provides vectors and host cells for expressing the sulfatase nucleic acid 
10 molecules and polypeptides, and particularly recombinant vectors and host cells. 

In another aspect, it is an object of the invention to provide isolated sulfatase 
polypeptides and fragments and variants thereof, including a polypeptide having the 
amino acid sequence shown m SEQ ID NOS: 1 , 3, 5 or 7 or the ammo acid sequences 
encoded by the deposited cDNAs. The disclosed sulfatase polypeptides are useful as 
15 reagents or targets in sulfatase assays and are applicable to treatment and diagnosis of 
sulfatase-related disorders. 

The invention also provides assays for detemiining the activity of or the presence 
or absence of the sulfatase polypeptides or nucleic acid molecules in a biological sample, 
including for disease diagnosis. In addition, the invention provides assays for 
20 determining the presence of a mutation in the polypeptides or nucleic acid molecules, 
including for disease diagnosis. 

A further object of the invention is to provide compounds that modulate 
expression of the sulfatase for treatment and diagnosis of sulfatase-related disorders. 
Such compoimds may be used to treat conditions related to aberrant activity or 
25 expression of the sulfatase polypeptides or nucleic acids. 

The disclosed invention further relates to methods and compositions for the 
study, modulation, diagnosis and treatment of sulfatase related disorders. The 
compositions include sulfatase polypeptides, nucleic acids, vectors, transformed cells 
and related variants thereof. In particular, the invention relates to the diagnosis and 
30 treatment of sulfatase-related disorders including, but not limited to disorders as 

described in the background above, further herein, or involving a tissue shown in the 
figures herein. 
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In yet another aspect, tlie invention provides antibodies or antigen-binding 
fragments thereof that selectively bind the sulfatase polypeptides and fragments. Such 
antibodies and antigen binding fragments have use in tlie detection of the sulfatase 
polypeptide, and in the prevention, diagnosis and treatment of sxalfatase related disorders. 

5 The sulfatases disclosed herein are designated as follows: 22438, 23553, 25278, 

and 26212. 

DESCRIPTION OF THE DRAWINGS 

10 Figure 1 shows the 22438 sulfatase cDNA sequence (SEQ ID N0:2) and the 

deduced amino acid sequence (SEQ ID NO:l). The 22438 sulfatase coding sequence is 
set forth in SEQ ID NO:ll. 

Figure 2 shows a 22438 sulfatase hydrophobicity plot. Relative hydrophobic 
15 residues are shown above the dashed horizontal line, and relative hydrophilic residues 
are below the dashed horizontal line. The cysteine residues (cys) and N glycosylation 
site (Ngly) are indicated by short vertical lines just below the hydropathy trace. The- 
numbers corresponding to the amino acid sequence (shown in SEQ ID NO:l) of 
22438 sulfatase are indicated. Polypeptides of the invention include fragments which 
20 include: all or a part of a hydrophobic sequence (a sequence above the dashed line); or 
all or part of a hydrophilic fragment (a sequence below the dashed line). Otiier 
fragments include a cysteine residue or as N-glycosylation site. 

Figure 3 shows an analysis of the 22438 sulfatase amino acid sequence: aptum 
25 and coil regions; hydrophilicity; amphipathic regions; flexible regions; antigenic 
index; and surface probability plot. 

Figure 4 shows an analysis of the 22438 sulfatase open reading frame for 
amino acids corresponding to specific functional sites. For the N-glycosylation sites, 
30 the actual modified residue is the furst amino acid. For cAMP- and cGMP-dependent 
protein kinase phosphorylation sites, the actual modified residue is the last amino 
acid. For protein kinase C phosphorylation sites, the actual modified residue is the 

-7- 
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first amino acid. For casein kinase II phosphorylation sites, the actual modified 
residue is the first amino acid. For N-myristoylation sites, the actual modified residue 
is the first amino acid. In addition, an amidation site is found ftom about amino acids 
56-59, an EGF-like domain cysteine pattern signature found from about amino acids 
5 260-271 5 and a sulfatase signature is found from about amino acids 129-138. 

Figure 5 shows the 23553 sulfatase cDNA sequence (SEQ ID NO:4) and the 
deduced amino acid sequence (SEQ ID NO:3). The 23553 sulfatase coding sequence is 
set foith m SEQ ID NO:12. 

10 

Figure 6 shows a 23553 sulfatase hydrophobicity plot. Relative hydrophobic 
residues are shown above the dashed horizontal line, and relative hydrophilic residues 
are below the dashed horizontal line. The cysteine residues (cys) and N glycosylation 
site (Ngly) are indicated by short vertical lines just below the hydropathy trace. The 
15 nmibers corresponding to the amino acid sequence (shown in SEQ ID NO: 3) of 

23553 sulfatase are indicated. Polypeptides of the invention include fragments which 
include: all or a part of a hydrophobic sequence (a sequence above the dashed line); or 
all or part of a hydrophilic fragment (a sequence below tlie dashed line). Other 
fragments include a cysteine residue or as N-glycosylation site. 

20 

Figure 7 shows an analysis of the 23553 sulfatase amino acid sequence: aptum 
and coil regions; hydrophilicity; amphipathic regions; flexible regions; antigenic 
index; and surface probability plot. 

25 Figure 8 shows an analysis of the 23553 sulfatase open reading frame for 

amino acids corresponding to specific fimctional sites. For the N-glycosylation sites, 
the actual modified residue is the first amino acid. For protein kinase C 
phosphorylation sites, the actual modified residue is the first amino acid. For casein 
kinase II phosphorylation sites, the actual modified residue is the first amino acid. 

30 For the tyrosine kinase phosphorylation site, the actual modified residue is the last 

amino acid residue. For N-myristoylation sites, the actual modified residue is the first 
amino acid. In addition, a sulfatase signature is found from about amino acids 85-97. 

-8- 
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Figure 9 shows relative expression of the 23553 sulfatase mRNA in normal and 
cancerous human tissues. 

5 Figure 10 shows the 25278 sulfatase cDNA sequence (SEQ ID NO;6) and the 

deduced amino acid sequence (SEQ ID NO:5). The 25278 sulfatase coding sequence is 
set forth in SEQ ID NO:13. 

Figure 1 1 shows a 25278 sulfatase hydrophobicity plot. Relative hydrophobic 
10 residues are shown above the dashed horizontal line, and relative hydrophilic residues 
are below the dashed horizontal line. The cysteine residues (cys) and N glycosylation 
site (Ngly) are indicated by short vertical lines just below the hydropathy trace. The 
numbers corresponding to the amino acid sequence (shown in SEQ ID NO:5) of 
25278 sulfatase are indicated . Polypeptides of the invention include fragments which 
15 include: all or a part of a hydrophobic sequence (a sequence above the dashed line); or 
all or part of a hydrophilic jfragment (a sequence below the dashed line). Other 
fragments include a cysteine residue or as N-glycosylation site. 

Figure 12 shows an analysis of the 25278 sulfatase amino acid sequence: 
20 aptum and coil regions; hydrophilicity; amphipathic regions; flexible regions; 
antigenic index; and surface probabiUty plot. 

Figure 13 shows an analysis of the 25278 sulfatase open reading frame for 
amino acids corresponding to specific functional sites. For the N-glycosylation sites, 

25 the actual modified residue is the first amino acid. For cAMP- and cGMP-dependent 
protein kinase phosphorylation sites, the actual modified residue is the last amino 
acid. For protein kinase C phosphorylation sites, the actual modified residue is the 
first amino acid. For casein kinase II phosphorylation sites, the actual modified 
residue is the first amino acid. For the tyrosine kinase phosphorylation site, the actual 

30 modified residue is the last anfiino acid residue. For N-myristoylation sites, the actual 
modified residue is the first amino acid. In addition, amidation sites are foimd from 

-9- 
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about amino acids 312-315 and 541-544, and sulfatase signatures are found from 
about amino acids 139-148 and 91-103. 

Figure 1 4 shows relative expression of 25278 sulfatase niRN A in normal and 
5 cancerous human tissues. 

Figure 1 5 shows the 26212 sulfatase cDNA sequence (SEQ ID N0:8) and the 
deduced amino acid sequence (SEQ ID NO:7). The 26212 sulfatase coding sequence is 
set forth in SEQ ID NO:14. 

10 

Figure 16 shows a 26212 sulfatase hydrophobicity plot. Relative hydrophobic 
residues are shown above the dashed horizontal line, and relative hydrophilic residues 
are below the dashed horizontal line. The cysteine residues (cys) and N glycosylation 
,site (Ngly) are indicated by short vertical Imes just below the hydropathy trace. The 
15 numbers corresponding to the amino acid sequence (shown in SEQ ID NO: 7) of 

26212 sulfatase are indicated. Polypeptides of the invention include fragments which 
include: all or a part of a hydrophobic sequence (a sequence above the dashed line); or 
all or part of a hydrophilic fragment (a sequence below the dashed line). Other 
fragments include a cysteine residue or as N-glycosylation site. 

20 

Figure 17 shows an analysis of the 26212 sulfatase amino acid sequence: 
aptum and coil regions; hydrophilicity; amphipathic regions; flexible regions; 
antigenic index; and surface probability plot. 

25 Figure 1 8 shows an analysis of the 26212 sulfatase open reading frame for 

amino acids corresponding to specific functional sites. For the N-glycosylation sites, 
the actual modified residue is the first amino acid. For cAMP- and cGMP-dependent 
protein kinase phosphorylation sites, the actual modified residue is the last amino 
acid. For protein kinase C phosphorylation sites, the actual modified residue is the 

30 first amino acid. For casein kinase II phosphorylation sites, the actual modified 

residue is the first amino acid. For the tyrosine kinase phosphorylation site, the actual 
modified residue is the last amino acid residue. For N-myristoylation sites, the actual 

- 10- 
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modified residue is the first amino acid. In addition, sulfatase signature sites are 
found from about amino acids 168-177 and 120-132. 



Figure 19 depicts an alignment of the 22438 sulfatase domain with a 
consensus amino acid sequence derived from a hidden Markov model. The upper 
sequence is the consensus amino acid sequence (SEQ ID NO:9), while the lower 
amino acid sequence corresponds to amino acids 36 to 462 of SEQ ID NO: 1 . 

Figure 20 depicts an alignment of the 23553 sulfatase domain with a 
consensus amino acid sequence derived from a hidden Markov model. The upper 
sequence is the consensus amino acid sequence (SEQ ID N0:9), while the lower 
amino acid sequence corresponds to amino acids 43 to 467 of SEQ IDNO:3. 

Figure 21 shows the expression of 23553 in the foUowuig human carcinoma 
cell hues: breast cancer cell lines MCF-7, ZR75, T47D, MDA231, and MDA435; 
colon cancer cell lines DLD-l, SW480, SW620, HCTl 16, HT29, and Colo205; lung 
cancer cell lines NCIH125, NCIH69, NCIH322, NCIH460, and A549. Expression 
levels were determined by reverse transcriptase(RT) quantitative PGR (Taqman® 
brand quantitative PGR kit. Applied Biosystems). The quantitative PGR reactions 
were performed according to the kit manufacturer's instructions. 

Figiare 22 shows the expression of 23553 in clinical samples of normal human 
breast tissue and the following human breast tumor tissues: ductal ui situ carcinoma 
(DCIS), invasive ductal carcinoma (IDC), and invasive lobular carcinoma (ILG). 
25 Expression levels were determined as described in the description of Figure 21. 

Figure 23 shows the expression of 23553 in human clinical samples of nomial 
colon, colon tumor; metastatic liver, and normal liver tissue. Expression levels were 
determined as described in the description of Figure 21. 

30 
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Figure 24 shows the expression of 23553 in normal human lung and 
adenocarcinoma (AC) and squamous cell carcinoma (SCC) lung tumor tissue. 
Expression levels were determined as described in the description of Figure 21. 



5 Figure 25 shows the expression of 23553 in the following normal human 

tissues: prostate (column 1), liver (columns 2 and 3), breast (columns 4 and 5), 
skeletal muscle (column 6), brain (columns 7 and 8), colon (columns 9 and 10), heart 
(columns 11 and 12), ovary (colmims 13 and 14), kidney (columns 15 and 16), Iting 
(columns 17 and 18), vein (columns 19 and 20), trachea (colimin 21), adipose 

10 (columns 22 and 23), small intestine (column 24), thyroid (columns 25 and 26), skin 
(columns 27 and 28), testes (column 29), placenta (colunrn 30), fetal liver (columns 
31 and 32), fetal heart (columns 33 and 34), osteoblasts (undifferentiated, column 35 
and primary culture, column 36), fetal spinal cord (column 38), cervix (column 39), 
spleen (column 40), spinal cord (colvimn 41), thymus (column 42), tonsil (column 43), 

1 5 lymph node (column 44), and aorta (column 45). 23553 was expressed at high levels 
in trachea, vein, osteoblast, kidney, and testes tissue; significant expression of 23553 
was noted in adipose, colon, skeletal muscle, thyroid, and prostate tissues. Expression 
levels were determined as described in the description of Figui^e 21 . 

20 Figure 26 shows the expression of 23553 in the following human tissues: 

normal brain (colunm 1), glioblastoma (columns 2-5), normal breast (column 6), 
breast tumor (colimins 7-9), normal colon (column 10), colon tumor (colunms 11-13), 
normal liver (column 14), metastatic colon (colirams 15 and 16), normal lung (colimm 
17), lung tumor (columns 18-20), placenta (column 21), fetal adrenal gland (column 

25 22), nomial skin (colunms 23 and 24), and adipose (column 25). 23553 was 

detectable in all tissues tested, with evidence of increased expression levels in breast, 
colon, and lung tumors. In addition, 23553 was expressed at an elevated level in 
glioblastoma tissue, as compared to normal brain tissue. Expression levels were 
determined as described in the description of Figure 21. 

30 

Figure 27 depicts an alignment of the 25278 sulfatase domain with a 
consensus amino acid sequence derived from a hidden Markov model. The upper 
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sequence is the consensus amino acid sequence (SEQ ID N0:9), while the lower 
amino acid sequence corresponds to amino acids 47 to 471 of SEQ ID NO:5. 



Figure 28 shows the relative expression of 25278 in various human tissues, as 

5 follows. Row 1, NDR 19, breast, DCIS (ductal in situ carcinoma); Row 2, MDA 
138, breast, normal; Row 3, NDR 01, breast, IDC (invasive ductal carcinoma); Row 
4, NDR 15, breast, DC (ductal carcinoma); Row 5, NDR 133, breast, ILC (invasive 
lobular carcinoma); Row 6, MDA 161, breast, IDC; Row 7, MDA 155, breast, 
IDC/DCIS; Row 8, PIT 270, lung, noimal; Row 9, CHT 427, lung, normal; Row 10, 

10 PIT 241, lung, normal; Row 11, PIT 298, lung, normal; Row 12, CHT 800, lung, AC 
(adenocarcinoma); Row 13, CHT 335, lung, SCC (squamous cell carcinoma); Row 
14, CHT447, lung, AC; Row 15, CHT 752, lung, AC; Row 1.6, CHT 799, lung, AC; 
Row 17, CHT 369, lung, SCC; Row 18, CHT 369, lung, SCC; Row 19, CHT 371, 
colon, normal; Row 20, CHT 396, colon, normal; Row 21, CHT 398, colon, normal; 

15 Row 22, NDR 104, colon, normal; Row 23, CHT 520, colon, adenocarcinoma; Row 
24, CHT 122, colon, adenocarcinoma; Row 25, CHT 536, colon, adenocarcinoma; 
Row 26, CHT 528, colon, adenocarcinoma; Row 27, CHT 386, colon, 
adenocarcinoma; Row 28, CHT 372, colon, adenocarcinoma; Row 29, CHT 532, 
colon, adenocarcinoma; Row 30, CHT 77, Hver, metastatic; Row 31, CHT 321, liver, 

20 metastatic; Row 32, CHT 84, Uver, metastatic; Row 33, NDR 100, liver, metastatic; 
Row 34, NDR 154, liver, normal; Row 35, CHT 322, liver, normal; Row 36, PIT 51, 
liver, normal; Row 37, CHT 339, liver, normal; Row 38, PIT 265, breast, normal; 
Row 39, MDA 335, breast, normal; Row 40, NDR 132, breast, DCIS; Row 41, NDR 
13, breast, nomial; Row 42, NDR 56, breast, normal. 

25 

Figure 29 depicts an alignment of the 26212 sulfatase domain with a 
consensus amino acid sequence derived from a hidden Markov model. The upper 
sequence is the consensus amino acid sequence (SEQ ID NO: 10), while the lower 
amino acid sequence corresponds to amino acids 76 to 502 of SEQ ID NO:7. 

30 

Figure 30 shows the expression of 26212 in various human endothelial cells, as 
follows. Proliferating human umbiUcal vein endothelial cells (HUVEC) (column 1); 
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arresting HUVEC (column 2); HUVEC minus growth factor (column 3); proliferating 
cardiac human micro vasculai* endothelial cells (HMVEC) (columns 4 and 6); arresting 
cardiac HMVEC (columns 5 and 7); proliferating lung HMVEC (columns 8, 1 1, and 
13); arresting lung HMVEC (columns 9, 12, and 14); and lung HMVEC minus growth 

5 factor (columns 10 and 15); HEK 293 (non-endothehal) cells (column 16). In six of six 
independent experiments, 26212 is up-regulated in proliferating endothelial cells as 
compared to arrested endothelial cells. Further, 26212 expression levels are higher in 
proliferating endothelial cells than in HEK 293 (non-endothelial) cells. Expression 
levels were determined as described in the description of Figure 21 . 

10 

Figure 31 shows the expression of 26212 in the foUowuig human tissues. Figure 
3 1 A: normal breast (columns 1 and 2), breast tumor (columns 3-9), normal ovary 
(colunms 10 and 11), ovary tumor (columns 12-19), normal lung (columns 20-23), lung 
tumor (colxmms 24-31). Figure 3 IB: nomial colon (columns 1-4), colon tumor (columns 

15 5-12), liver metastases (colvunns 13-16), normal liver (columns 17-18), nonnal brain 
(columnsl9-20), astrocyte (column 21), brain tumor (colvimns 22-25), arresting human 
microvascular endothelial cells (column 26), proliferating human microvascular 
endothelial cells (column 27), placenta (colimm 28), fetal adrenal tissue (columns 29- 
30), and fetal liver (column 31), Expression levels were determined as described in 

20 the description of Figure 2 1 . 

Figure 32 shows 26212 expression in normal human clinical breast samples 
(columns 1 and 2) and human clinical breast tumor samples (columns 3-9). Expression 
levels were determined as described in the description of Figure 21 . 

25 

Figure 33 shows 26212 expression in normal human clinical limg samples 
(columns 1-4) and human clinical lung tumor samples (columns 5-12). Expression 
levels were determined as described in the description of Figure 21, 

30 Figure 34 shows the temporal expression of 26212 in human nomial and breast 

cancer epithelial cell lines (MCFIOA and MCF3B, respectively) after treatment with 
epidemial growth factor (EGF). MCFIOA cells are shown 0, 0.5, 1, 2, 4, and 8 hours 
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after treatment with EGF (columns 1-6, respectively). Similarly, MCF3B cells are 
shown 0, 0.5, 1, 2, 4, and 8 hours after treatment with EGF (columns 7-12, respectively). 
26212 is up-regulated in both cell lines. Expression levels were determined as 
described in the description of Figure 21 . 

5 

Figure 35 shows expression of 26212 in human hemangiomas and other 
angiogenic tissues: hemangioma (ONC 101; column 1); hemangioma (ONC 102; 
column 2); hemangioma (ONC 103; column 3); skin (NDR 295; column 4); fetal heart 
(BWH4; column 5); normal heart (MPI 849; colmin 6); spinal cord (CKN 746; column 
10 7); uterine adenocarcinoma (CHT 1424; column 8); and endometrial polyps (CLN 944; 
column 9). Expression levels were detennined as described in tiie description of 
Figure 21. 



Figure 36 shows expression of 26212 in the following human tissues: normal 

1 5 artery (column 1), normal vein (column 2), aortic smooth muscle cells (SMC), early 
(colunm 3), coronary SMC (column 4), static human umbilical vein endotiielial cells 
(HUVEC) (column 5), shear HUVEC (colunm 6), normal heart (column 7), heart, 
congestive heart failure (CHF) (column 8), kidney (column 9), skeletal muscle (column 
10), normal adipose (column 1 1), pancreas (column 12), primary osteoblasts (colunm 

20 1 3), osteoclasts, differentiated (colunm 14), nomial skin (column 1 5), normal spinal cord 
(column 16), normal brain cortex (column 1 7), normal brain hypothalamus (column 1 8), 
nerve (column 19), dorsal root ganglion (DRG) (column 20), glial cells (astrocytes) 
(column 21), glioblastoma (column 22), normal breast (column 23), breast tumor 
(colunm 24), normal ovary (column 25), ovary tumor (colmnn 26), normal prostate 

25 (colxmm 27), prostate tumor (colunm 28), prostate epithelial cells (column 29), normal 
colon (column 30), colon tumor (column 31), normal lung (coliram 32), Iting tumor 
(column 33), lung, chronic obstmctive pulmonary disease (COPD) (column 34), colon, 
inflammatory bowel disease (IBD) (column 35), noraial liver (colunm 36), liver fibrosis 
(colunm 37), dermal cells, fibroblasts (column 38), noimal spleen (column 39), nomial 

30 tonsil (column 40), lymph node (column 41), small intestine (column 42), skin, 

decubitus (column 43), synovium (column 44), bone marrow mononuclear cells (BM- 
MNC) (colunm 45), and activated peripheral blood mononuclear cells (PBMC) (colvimn 
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46). The expression levels of 26212 are higher in endothelial and glial cells than in other 
tissues and cells. Expression levels were determined as described in the description of 
Figure 21. 

5 DETAILED DESCRIPTION OF THE INVENTION 

Sulfatase Polypeptides 

The invention is based on the identification of the novel human 22438 
sulfatase. In situ hybridization experiments showed that this sulfatase is expressed in 
10 the following monkey tissues: sub-populations of DRG neurons (mainly in small and 
medium sized neurons), in spinal cord (intemeurons and motor neurons), and in the 
brain. The sulfatase is also expressed in human brain. The sulfatase cDNA was 
identified based on consensus motifs or protein domains characteristic of sulfatases 
and, in particular, arylsulfatase. BLAST analysis has shown homology with human 
15 arylsulfatase E, a himian iduronate-2-sulfatase, human N-acetylgalactosamine-6- 
sulfatase, murine arylsulfatase A, and human arylsulfatase A. However, some 
homology has also been found with other arylsulfatases from various mammalian 
species, including, but not limited to, human arylsulfatase D, E, F, and B. 

The invention is also based on the identification of the novel human 23553 
20 sulfatase. Taqman analysis has shown positive differential expression in breast and 
colon cancer and in colonic metastases to the liver (Figure 9)- This sulfatase has been 
identified as a glucosamine-6-sulfatase based on ProDom matches and BLAST 
analysis. Some homology has also been foxmd to human arylsulfatase A, human N- 
acetylglucosamine-6-sulfatase, and human iduronate-2-sulfatase. 
25 The invention is also based on the identification of the novel human 25278 

sulfatase. The sulfatase is differentially expressed in human colon cancer and in 
colonic metastases to the liver, as determined by Taqman analysis. This sulfatase has 
been identified as a N-acetylgalactosamine-4-sulfatase by ProDom matching and 
BLAST homology alignment. Further, based on BLAST analysis, some homology 
30 has also been shown to arylsulfatase B and arylsulfatase A. 

The invention is also based on the identification of the novel human 26212 
sulfatase. This sulfatase has been identified as an arylsulfatase by ProDom matching 
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and BLAST sequence alignment. Homology has been shown to arylsulfatase B. 
Some homology has also been found with arylsulfatase F, E, D, and A, as well as with 
iduronate 2 sulfatase. Arylsulfatase B is also known as N-acetylgalactosamine-4- 
sulfatase. 

5 Specifically, newly-identified hviman genes, temied 22438, 23553, 25278, and 

26212 sulfatases are provided. These sequences, and other nucleotide sequences 
encoding the sulfatase proteins or fragments and variants thereof, are referred to as 
"22438, 23553, 25278, and 26212 sulfatase sequences." 

Plasniids containing the sulfatase cDNA inserts were deposited with the Patent 

10 Depository of the American Type Culture Collection (ATCC), 10801 University 

Boulevard, Manassas, Virginia, on , April 5, 2000, May 9, 2000, or , and 

assigned Patent Deposit Numbers , PTA-1639, PTA-1846, or , respectively. 

The deposits will be maintained imder the terms of the Budapest Treaty on the 
International Recognition of the Deposit of Microorganisms for the Purposes of 

15 Patent Procedure. The deposits were made merely as a convenience for those of skill 
in the art £ind is not an admission that a deposit is required under 35 U.S.C. §112. 

The sulfatase cDNA was identified in human cDNA hbraries. Specifically, 
expressed sequence tags (EST) foxmd in human cDNA libraries, were selected based on 
homology to known sulfatase sequences. Based on such EST sequences, primers were 

20 designed to identify a full length clone from a human cDNA library. Positive clones 
were sequenced and the overlapping fragments were assembled. The 22438, 23553, 
25278, and 26212 sulfatase amino acid sequences are shown in Figures 1, 5, 10, and 15, 
respectively, and SEQ ID NOS:l, 3, 5, and 7. The 22438, 23553, 25278, and 26212 
sulfatase cDNA sequences are shown in Figures 1, 5, 10, and 15 and SEQ ID N0S:2, 4, 

25 6, and 8. 

Analysis of the assembled sequences revealed that the cloned cDNA 
molecules encoded sulfatase-like polypeptides. BLAST analysis indicated that the 
23553 sulfatase is a glucosaniine-6-sulfatase, that the 25278 sulfatase is anN- 
acetyl galactosamine-4-sulfatase, that the 22438 is an arylsulfatase with highest 
30 homology to arylsulfatase A and E genes and that the 26212 sulfatase is an 
arylsulfatase with highest homology to the arylsulfatase B gene (N- 
acetylgalactosamine-4-sulfatase). 
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The sulfatase sequences of the invention belong to the sulfatase family of 
molecules having conserved functional features. The term "family" when referring to 
the proteins and nucleic acid molecules of the invention is intended to mean two or 
more proteins or nucleic acid molecules having sufficient amino acid or nucleotide 
5 sequence identity as defined herein to provide a specific function. Such family 
members can be naturally-occurring and can be from either the same or different 
species. For example, a family can contain a first protein of murine origin and an 
ortholog of that protein of human origin, as well as a second, distinct protein of 
himian origin and a murine ortholog of that protein. 
10 The 22438 sulfatase gene encodes an approximately 2175 nucleotide mRNA 

transcript having the corresponding cDNA set forth in SEQ ID NO:2, This transcript has 
an open reading frame which encodes a 525 amino acid protein (SEQ ID N0:1). 

The 23553 sulfatase gene encodes an approximately 4321 nucleotide mRNA 
transcript having the corresponding cDNA set forth in SEQ ID NO:4. This transcript has 
15 an open reading frame which encodes an 871 amino acid protein (SEQ ID NO:3). 

The 25278 sulfatase gene encodes an approximately 2940 nucleotide mRNA 
transcript having the corresponding cDNA set forth in SEQ ID NO:6. This transcript has 
an open reading frame which encodes a 569 amino acid protein (SEQ ID NO:5). 

The 26212 sulfatase gene encodes an approximately 2253 nucleotide mRNA ^ 
20 transcript having the corresponding cDNA set forth in SEQ ID NO:8. This transcript has 
an open reading frame which encodes a 599 amino acid protein (SEQ ID NO: 7). 

Prosite program analysis was used to predict various sites within the 22438 
sulfatase protein as shown in Figure 4. 

Prosite program analysis was used to predict various sites within the 23553 
25 sulfatase protein as shown in Figure 8. 

Prosite program analysis was used to predict various sites within the 25278 
sulfatase protein as shown in Figure 13. 

Prosite program analysis was used to predict various sites within the 26212 
sulfatase protein as shown in Figure 1 8. 
30 In situ hybridization experiments showed that 22438 is expressed in 

subpopulations of DRG neurons, spinal cord, and brain, as disclosed hereinabove. 
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Expression of the 22438 sulfatase mRNA in the above cells and tissues 
indicates that the sulfatase is likely to be involved in the proper function of and in 
disorders involving these tissues. Accordingly, the disclosed hivention further relates 
to methods and compositions for the study, modulation, diagnosis and treatment of 
5 sulfatase related disorders, especially disorders of these tissues tliat include, but are 
not limited to those disclosed herein. 

The 23553 sulfatase is differentially expressed in breast and colon cancer and 
in colonic metastases to the liver. Accordingly, the disclosed invention fiirther relates 
to methods and compositions for the study, modulation, diagnosis and treatment in 
1 0 these tissues (normal and tumor). 

The 25278 sulfatase is dififerentially expressed in colon tumors and colonic 
metastases to the liver. Accordingly, the disclosed invention further relates to 
methods and compositions for the study, modulation, diagnosis and treatment in these 
normal and tumor tissues. 
1 5 The 26212 sulfatase is differentially expressed in colon metastases and lung 

tumors. Accordingly, the disclosed invention further relates to methods and 
compositions for the study, modulation, diagnosis and treatment in these normal and. 
tumor tissues. 

The compositions include sulfatase polypeptides, nucleic acids, vectors, 
20 transformed cells and related variants and fragments thereof, as well as agents that 
modulate expression of the polypeptides and polynucleotides. In particular, the 
invention relates to the modulation, diagnosis and treatment of sulfatase related 
disorders as described herein. 

Treatment is defined as the application or administration of a therapeutic agent 
25 to a patient, or application or administration of a therapeutic agent to an isolated tissue 
or cell line from a patient, who has a disease, a symptom of disease or a predisposition 
toward a disease, with the purpose to cure, heal, alleviate, reUeve, alter, remedy, 
ameliorate, improve or affect the disease, the symptoms of disease or the 
predisposition toward disease, "Subject, as used herein, can refer to a mammal, e.g, a 
30 human, or to an experimental or animal or disease model. The subject can also be a 
non-hxunan animal, e.g. a horse, cow, goat, or other domestic animal. A therapeutic 
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agent includes, but is not limited to, small molecules, peptides, antibodies, ribozymes 
and antisense oligonucleotides. 

Disorders involving the brain include, but ai*e not limited to, disorders 
involving neurons, and disorders involving glia, such as astrocytes, oligodendrocytes, 
5 ependymal cells, and microglia; cerebral edema, raised intracranial pressure and 
hemiation;> and hydrocephalus; malformations and developmental diseases, such as 
neural tube defects, forebrain anomalies, posterior fossa anomalies, and syringomyelia 
and hydromyelia; perinatal brain injury; cerebrovascular diseases, such as those 
related to hypoxia, ischemia, and infarction, including hypotension, hypoperfusion, 
10 and low-flow states-global cerebral ischemia and focal cerebral ischemia— infarction 
from obstruction of local blood supply, intracranial hemorrhage, including 
intracerebral (intraparenchymal) hemorrhage, subarachnoid hemorrhage and ruptured 
beiTy aneurysms, and vascular malformations, hypertensive cerebrovascular disease, 
including lacunar infarcts, slit hemorrhages, and hypertensive encephalopathy; 
15 infections, such as acute meningitis, including acute pyogenic (bacterial) meningitis 
and acute aseptic (viral) meningitis, acute focal suppurative infections, including brain 
abscess, subdural empyema, and extradmal abscess, chronic bacterial 
meningoencephalitis, including tuberculosis and mycobacterioses, neurosyphilis, and 
neuroborreliosis (Lyme disease), viral meningoencephalitis, including arthropod- 
20 home (Arbo) viral encephalitis, Herpes simplex virus Type 1, Herpes simplex virus 

Type 2, Varicalla-zoster vims {Herpes zoster), cytoniegaloviras, poliomyelitis, rabies, 
and human immunodeficiency virus 1, including HIV-1 meningoencephalitis 
(subacute encephalitis), vacuolar myelopathy, AIDS-associated myopathy, peripheral 
neuropathy, and AIDS in children, progressive multifocal leukoencephalopathy, 
25 subacute sclerosing panencephalitis, fungal meningoencephalitis, other infectious 
diseases of the nervous system; transmissible spongiform encephalopathies (prion 
diseases); demyelinating diseases, including multiple sclerosis, multiple sclerosis 
variants, acute disseminated encephalomyelitis and acute necrotizing hemorrhagic 
encephalomyelitis, and other diseases with demyelination; degenerative diseases, such 
30 as degenerative diseases affecting the cerebral cortex, including Alzheimer disease 
and Pick disease, degenerative diseases of basal ganglia and brain stem, including 
Parkinsonism, idiopathic Parkinson disease (paralysis agitans), progressive 
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supranuclear palsy, corticobasal degeneration, multiple system atrophy, including 
striatonigral degeneration, Shy-Drager syndrome, and olivopontocerebellar atrophy, 
and Huntington disease; spinocerebellar degenerations, including spinocerebellar 
ataxias, including Friedreich ataxia, and ataxia-telangiectasia, degenerative diseases ■ 
5 affecting motor neurons, including amyotrophic lateral sclerosis (motor neuron 
disease), bulbospinal atrophy (Kennedy syndrome), and spinal muscular atrophy; 
inbom errors of metabolism, such as leukodystrophies, including Krabbe disease, 
metachromatic leukodystrophy, adrenoleukodystrophy, Pelizaeus-Merzbacher 
disease, and Canavan disease, mitochondrial encephalomyopathies, including Leigh 
10 disease and other mitochondrial encephalomyopathies; toxic and acquired metabolic 
diseases, including vitamin deficiencies such as thiamine (vitamin Bi) deficiency and 
vitamin B12 deficiency, neurologic sequelae of metabolic disturbances, including 
hypoglycemia, hyperglycemia, and hepatic encephatopathy, toxic disorders, including 
carbon monoxide, methanol, ethanol, and radiation, including combined methotrexate 
15 and radiation-induced uijury; tumors, such as gliomas, including astrocytoma, 

including fibrillary (difiEuse) astrocytoma and glioblastoma multiforme, pilocytic 
astrocytoma, pleomorphic xanthoastrocytoma, and brain stem glioma, 
oligodendroglioma, and ependymoma and related paraventricular mass lesions, 
neuronal tumors, poorly differentiated neoplasms, including meduUoblastoma, other 
20 parenchymal tumors, including primary brain lymphoma, germ cell tumors, and 
pineal parenchymal tumors, meningiomas, metastatic tumors, paraneoplastic 
syndromes, peripheral nerve sheath tumors, including schwannoma, neurofibroma, 
and malignant peripheral nerve sheath tumor (malignant schwannoma), and 
neurocutaneous syndromes (phakomatoses), including neurofibromotosis, including 
25 Type 1 neurofibromatosis (NFl) and TYPE 2 neurofibromatosis (NF2), tuberous 
sclerosis, and Von Hippel-Lindau disease. 

Furthemiore, as disclosed in the backgroimd hereinabove, specific disorders 
have been associated with function of the various sulfatases. Accordingly, the 
sulfatases disclosed herein, having homology to specific sulfatases as disclosed 
30 herein, are useful for diagnosis and treatment of the disorders associated with 

sulfatase dysfunction as disclosed herein and to modulation of gene expression in the 
affected tissues. 
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The sequences of the invention find use m diagnosis of disorders involving an 
increase or decrease in sulfatase expression relative to normal expression, such as a 
proliferative disorder, a differentiative disorder, or a developmental disorder. The 
sequences also find use in modulating sulfatase-related responses. By "modulating" is 
5 intended the upregulating or downregulating of a response. That is, the compositions 
of the invention affect the targeted activity in either a positive or negative fashion. 

The invention relates to novel sTilfatases, having the deduced amino acid 
sequence shown in Figures 1, 5, 10, and 15 (SEQ ID NOS:l, 3, 5, and 7) or having the 

amino acid sequences encoded by tlie deposited cDNAs, Patent Deposit Numbers , 

10 PTA-1639, PTA-1 846, or . The deposited sequences, as well as the polypeptides 

encoded by the sequences, are incorporated herein by reference and control in the event 
of any conflict, such as a sequencing error, with description in this application. 

Thus, the present invention provides an isolated or purified sulfatase 
polypeptides and variants and fragments thereof. "Sulfatase polypeptide" or "sulfatase 
1 5 protein" refers to the polypeptide in SEQ ID NOS:l, 3, 5, or 7 or encoded by the 

deposited cDNAs. The term "sulfatase protein" or "sulfatase polypeptide," however, 
further includes the numerous variants described herein, as well as fi:agments derived 
from the full-length sulfatase and variants. 

Sulfatase polypeptides can be purified to homogeneity. It is understood, 
20 however, that preparations in which the polypeptide is not purified to homogeneity are 
useful and considered to contain an isolated form of the polypeptide. The critical feature 
is that the preparation allows for the desired function of the polypeptide, even in the 
presence of considerable amounts of other components. Thus, the invention 
encompasses various degrees of purity. 
25 As used herein, a polypeptide is said to be "isolated" or "purified" when it is 

substantially firee of cellular material when it is isolated fi"om recombinant and non- 
recombinant cells, or fi-ee of chemical precursors or other chemicals when it is 
chemically synthesized. A polypeptide, however, can be joined to another polypeptide 
with which it is not normally associated in a cell and still be considered "isolated" or 
30 "purified." 

In one embodiment, the language "substantially free of cellular material" 
includes preparations of sulfatase having less than about 30% (by dry weight) other 



-22- 



wo 01/55411 PCT/USOl/03266 
proteins (i.e., contaminating protein), less than about 20% other proteins, less than about 
10% other proteins, or less than about 5% other proteins. When the polypeptide is 
reconibinantly produced, it can also be substantially free of culture naedium, i.e., culture 
medium represents less than about 20%, less than about 1 0%, or less than about 5% of 

5 the volume of the protein preparation. 

The sulfatase polypeptide is also considered to be isolated when it is part of a 
membrane preparation or is purified and then reconstituted with membrane vesicles or 
liposomes. 

The language "substantially free of chemical precursors or other chemicals" 
10 includes preparations of the sulfatase polypeptide in which it is separated from chemical 
precursors or other chemicals that are involved in its synthesis. The language 
"substantially free of chemical precursors or other chemicals" includes, but is not limited 
to, preparations of the polypeptide having less than about 30% (by dry weight) chemical 
precursors or other chemicals, less than about 20% chemical precursors or other 
1 5 chemicals, less than about 10% chemical precursors or other chemicals, or less than 
about 5% chemical precursors or other chemicals. 

In one embodiment, the sulfatase polypeptide comprises the amino acid sequence 
shown in SEQ ID NOSil, 3, 5, or 7. However, the invention also encompasses sequence 
variants. By "variants" is intended proteins or polypeptides having an amino acid 
20 sequence that is at least about 45%, 55%, 65%, preferably about 75%, 85%, 95%, or 
98% identical to the amino acid sequence of SEQ ID NOS: 1,3,5, or 7. Variants also 
include polypeptides encoded by the cDNA insert of the plasmid deposited with 

ATCC as Patent Deposit Numbers , PTA-1639, PTA-1846, or , or 

polypeptides encoded by a nucleic acid molecule that hybridizes to the nucleic acid 
25 molecule of SEQ ID NOS:2, 4, 6, 8, 1 1, 12, 13, or 14, or a complement thereof, under 
stringent conditions. In another embodiment, a variant of an isolated polypeptide of 
the present invention differs, by at least 1, but less than 5, 10, 20, 50, or 100 amino 
acid residues from the sequence shown in SEQ ID NO:l, 3, 5, or 7. If alignment is 
needed for this comparison the sequences should be aligned for maximum identity. 
30 "Looped" out sequences from deletions or insertions, or mismatches, are considered 
differences. Such variants generally retain the functional activity of the 22438-like, 
23553-like, 25278-like, or 26212-like proteins of the invention. Variants include 
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polypeptides that differ in amino acid sequence due to natural allelic variation or 
mutagenesis. 

Variants include a substantially homologous protein encoded by the same genetic 
locus in an organism, i.e., an allelic variant. Variants also encompass proteins derived 
5 from other genetic loci in an organism, but having substantial homology to the sulfatase 
of SEQ ID NOS: 1 , 3, 5, or 7. Variants also include proteins substantially homologous to 
the sulfatase but derived from another organism, i.e., an ortholog. Variants also include 
proteins that are substantially homologous to the sulfatase that are produced by chemical 
synthesis. Variants also include proteins that are substantially homologous to the 

1 0 sulfatase that are produced by recombinant methods. Variants retain the biological 
activity (for example, sulfatase activity) of the polypeptide set forth by the reference 
sequence (SEQ ID NOS: 1, 3, 5, or 7). It is understood, however, that variants exclude 
any amino acid sequences disclosed prior to the invention. 

Preferred sulfatase polypeptides of the present invention have an amino acid 

15 sequence sufficiently identical to the amino acid sequence of SEQ ID NOS:l , 3, 5, or 
7. The term "sufficiently identical" is used herein to refer to a first amino acid or 
nucleotide sequence that contains a sufficient or minimum number of identical or 
equivalent (e.g., with a similar side chain) amino acid residues or nucleotides to a 
second amino acid or nucleotide sequence such that the first and second amino acid or 

20 nucleotide sequences have a common stmctural domain and/or common functional 
activity. For example, amino acid or nucleotide sequences that contain a common 
structural domain having at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 
96%, 97%, 98% or 99% identity are defined herein as sufficiently identical. 
In one embodiment, a variant of the 23553 sulfatase is greater than 92% 

25 homologous. In another embodiment, a variant of the 25278 sulfatase is greater than 
50% identical. In another embodiment, the 26212 sulfatase is greater than 50% 
identical. 

To determine the percent identity of two amino acid sequences, or of two 
nucleic acid sequences, the sequences are aligned for optimal comparison purposes 
30 (e.g., gaps can be introduced in one or both of a first and a second amino acid or 

nucleic acid sequence for optimal alignment and non-homologous sequences can be 
disregarded for comparison purposes). In a preferred embodiment, the length of a 

-.24- 



BNSDOCID: <WO 01 55411 A2J_> 



wo 01/5541 1 PCT/USOl/03266 
reference sequence aligned for comparison purposes is at least 30%, preferably at 
least 40%, more preferably at least 50%, even more preferably at least 60%, and even 
more preferably at least 70%, 80%, 90%, 100% of the length of the reference 
sequence. The amino acid residues or nucleotides at corresponding amino acid 
5 positions or nucleotide positions are then compared. When a position in the first 
sequence is occupied by the same amino acid residue or nucleotide as the 
corresponding position in the second sequence, then the molecules are identical at that 
position (as used herein amino acid or nucleic acid "identity" is equivalent to amino 
acid or nucleic acid "homology"). The percent identity between the two sequences is 
10 a function of the number of identical positions shared by the sequences, taking into 
account the number of gaps, and the length of each gap, which need to be introduced 
for optimal alignment of the two sequences. 

The comparison of sequences and determination of percent identity between 
two sequences can be accomplished using a mathematical algorithm. In a preferred 
15 embodiment, the percent identity between two amino acid sequences is determined 
using the Needleman and Wunsch (1970) J. Mol Biol 45:444-453 algorithm which 
has been incorporated into the GAP program in the GCG software package (available 
at http://www.gcg.com), using either a Blossum 62 matrix or a PAM250 matrix, and a 
gap weight of 16, 14, 12, 10, 8, 6, or 4 and a length weight of 1, 2, 3, 4, 5, or 6. In yet 
20 another preferred embodiment, the percent identity between two nucleotide sequences 
is determined using the GAP program in the GCG software package (available at 
http://www.gcg.com), using aNWSgapdna.CMP matrix and a gap weight of 40, 50, 
60, 70, or 80 and a length weight of 1, 2, 3, 4, 5, or 6. A particularly preferred set of 
parameters (and the one that should be used if the practitioner is uncertain about what 
25 parameters should be applied to determine if a molecule is within a sequence identity 
or homology limitation of the invention) is using a Blossum 62 scoring matrix with a 
gap open penalty of 12, a gap extend penalty of 4, and a frameshift gap penalty of 5. 

The percent identity between two amino acid or nucleotide sequences can be 
determined using the algorithm of E. Meyers and W. Miller (1989) CABIOS 4:11-17 
30 which has been incorporated into the ALIGN program (version 2.0), using a PAM120 
weight residue table, a gap length penalty of 12 and a gap penalty of 4. 
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The nucleic acid and protein sequences described herein can be used as a 
"query sequence" to perfonn a search against public databases to, for example, 
identify other family members or related sequences. Such searches can be performed 
using tlie NBLAST and XBLAST programs (version 2.0) of Altschul, et al (1990) J, 
5 Mol. Biol 275:403-10. BLAST nucleotide searches can be performed with the 
NBLAST program, score = 100, wordlength = 12 to obtain nucleotide sequences 
homologous to the nucleic acid molecules of the invention. BLAST protein searches 
can be performed with the XBLAST program, score = 50, w^ordlength = 3 to obtain 
amino acid sequences homologous to the protein molecules of the invention. To 
10 obtain gapped alignments for comparison purposes, Gapped BLAST can be utilized as 
described in Altschul 12/. {1991) Nucleic Acids Res. 25(77;:3389-3402. When 
utilizing BLAST and Gapped BLAST programs, the default parameters of the 
respective programs (e.g., XBLAST and NBLAST) can be used. See 
http://www.ncbi.nlm.nih.gov. 
15 The invention also encompasses polypeptides having a lower degree of 

identity but having sufficient similarity so as to perform one or more of the same 
functions performed by the sulfatase. Similarity is determined by conservative amino 
acid substitution, as shown in Table 1 . Such substitutions are those that substitute a 
given amino acid in a polypeptide by another amino acid of like characteristics. 
20 Conservative substitutions are likely to be phenotypically silent. Typically seen as 
conservative substitutions are the replacements, one for another, among the aliphatic 
amino acids Ala, Val, Leu, and He; interchange of the hydroxyl residues Ser and Thr, 
exchange of the acidic residues Asp and Glu, substitution between the amide residues 
Asn and Gin, exchange of the basic residues Lys and Arg and replacements among 
25 the aromatic residues Phe, Tyr. Guidance concerning which amino acid changes are 
likely to be phenotypically silent are found in Bowie et aL, Science 2-1^7:1306-1310 
(1990). 
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TABLE 1. Conservative Amino Acid Substitutions. 
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Aromatic 


Phenylalanine 




Tryptophan 




Tyrosine 


Hydrophobic 


Leucine 




Isoleucine 




Valine 


Polar 


Glutanaine 




Asparagine 


Basic 


Arginine 




Lysine 




Histidine 


Acidic 


Aspartic Acid 




Glutamic Acid 


SmaU 


Alanine 




Serine 




Threonine 




Methionine 




Glycine 



A variant polypeptide can differ in amino acid sequence by one or more 
5 substitutions, deletions, insertions, inversions, fusions, and truncations or a 

combination of any of tliese. Variant polypeptides can be fully functional or can lack 
function in one or more activities. Thus, in the present case, variations can affect the 
function, for example, of one or more of regions including a metal (e.g., Ca"^)- 
binding domain, activation domain, sulfatase catalytic domain, the region containing a 
10 propeptide, regulatory regions, substrate binding regions, regions involved in 
membrane association or subcellular localization, regions involved in post- 
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translational modification, for example, by phosphorylation, and regions that are 
important for effector function (i.e., agents that act upon the protein, such as in the 
conversion of cysteine to 2-amino-3-oxoproprionic acid or serine semi-aldehyde). 
Fully functional variants typically contain only conservative variation or 
5 variation in non-critical residues or in non-critical regions. Functional variants can also 
contain substitution of similar amino acids, which results in no change or an insignificant 
change in function. Altematively, such substitutions may positively or negatively affect 
function to some degree. 

Non-functional variants typically contain one or more non-conservative amino 
10 acid substitutions, deletions, insertions, inversions, or truncation or a substitution, 
insertion, inversion, or deletion in a critical residue or critical region. 

As indicated, variants can be naturally-occurring or can be made by recombinant 
means or chemical synthesis to provide useful and novel characteristics for the sulfatase 
polypeptide. This includes preventmg immxmogenicity from pharmaceutical 
1 5 formulations by preventing protein aggregation. 

Useful variations further include alteration of functional activity. For example, 
one embodiment involves a variation at the substrate bindmg site that results in binding 
but not hydrolysis or more or less hydrolysis of the substrate than wild type. A further 
useful variation at the same site can result in altered affinity for the substrate. Useful 
20 variations also include changes that provide for affinity for another substrate. Useful 

variations further include the ability to bind an effector molecule with greater or lesser 
affinity, such as not to bind or to bind but not release it. Further useful variations 
include alteration in the ability of the propeptide to be cleaved by a cleavage protein, 
including alteration in the binding or recognition site. Fmiher, the cleavage site can 
25 also be modified so that recognition and cleavage are by a different protease. A 

specific useful variation involves a variation in the ability to be bound or activated by 
the enzyme that activates the sulfatase by the conversion of cysteine to 2-3- 
oxoproprionic acid or serine semi-aldehyde. Further variation could include a 
variation in the specificity of metal binding. 
30 Another useful variation provides a fusion protein in which one or more domains 

or subregions are operationally fused to one or more domains, subregions, or motifs 
from another sulfatase. For example, a transmembrane domain from a protein can be 
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introduced into the sulfatase such that the protein is anchored in the cell surface. 
Other permutations include changing the number of sulfatase domains, and mixing of 
sulfatase domains from different sulfatase families, so that substrate specificity is 
altered. Mixing these various domains can allow the formation of novel sulfatase 
5 molecules with different host cell, subcellular localization, substrate, and effector 
molecule (one that acts on the sulfatase) specificity. 

The term "substrate" is intended to refer not only to the sulfated substrate that 
is cleaved by the sulfatase domain, but to refer to any component with which the 
polypeptide interacts in order to produce an effect on that component or a subsequent 
1 0 biological u rfect that is a result of interacting with that component. This can include, 
but is not limited to, for example, interaction with the sulfatase activation enzyme and 
components involved in the conversion of 3' phosphoadenosine 5' phosphosulfate to 
adenosine 3' 5' biphosphate. 

Amino acids that are essential for function can be identified by methods known 
15 in the art, such as site-directed mutagenesis or alanine-scanning mutagenesis 

(Cunningham et al (1985) Science 24^:1081-1085). The latter procedure introduces 
single alanine mutations at every residue in the molecule. The resulting mutant 
molecules are tlien tested for biological activity, such as peptide bond hydrolysis in vitro 
or related biological activity, such as proliferative activity. Sites that are critical for 
20 binding can also be detemiined by structural analysis such as crystallization, nuclear 

magnetic resonance or photoafBnity labeling (Smith et ah (1992) J. Mol Biol 224:S99- 
904; de Vos et al (1992) Science 255:306-312). 

The invention thus also includes polypeptide fi-agments of the sulfatases. 
Fragments can be derived from the amino acid sequence shown in SEQ ID NOS:l, 3, 5, 
25 or 7. However, the invention also encompasses fragments of the variants of the sulfatase 
polypeptides as described herein. The fragments to which the invention pertains, 
however, are not to be constraed as encompassing fragments that may be disclosed prior 
to the present invention. 

A fragment can comprise at least about 10, 1 1, 12, 13, 14, 15, 16, 17, 18, 19, 20, 
30 21, 22, 23, 24, 25, 30, 35, 40, 45, 50 or more contiguous amino acids. Fragments can 
retain one or more of the biological activities of the protein, for example as discussed 
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above, as well as fragments that can be used as an immunogen to generate sulfatase 
antibodies. 

For example, for the 25278 sulfatase, the invention encompasses amino acid 
fragments greater than 5 amino acids, particularly from regions up to around 
5 nucleotide 450 and beyond around nucleotide 1520. Specific fragments M^liich may 
be excluded include those that are underlined in Figure 1 . However, even in regions 
between aroimd nucleotide 450 to around nucleotide 1520, fragments include those 
that are five or greater excluding those which may have been disclosed prior to the 
present invention. 

10 For the 23553 sulfatase, fragments particularly include fragments of 5 amino 

acids or more up to around nucleotide 670. 

For the 26212 sulfatase, for example, fragments containing 5 or more amino 
acids up to about nucleotide 572 are particularly encompassed by the invention. 
However, fragments of 5 amino acids or more encoded by around nucleotide 572 to 
1 5 aroxmd nucleotide 1 985 are also encompassed by the invention with the understanding 
that such fragments do not encompass those which may have been disclosed prior to 
the invention. For example, these can include the sections underlined in Figure 15. 

Biologically active fragments (peptides which are, for example, about 5, 10, 
15, 20, 25, 30, 35, 40, 50, 100 or more amino acids in length) can comprise a 
20 fimctional site. Such sites include but are not limited to those discussed above, such as a 
catalytic site, regulatory site, site important for substrate recognition or bmding, regions 
containing a sulfatase domain or motif, phosphorylation sites, glycosylation sites, and 
other fimctional sites disclosed herein. 

Fragments, for example, can extend in one or both directions from the fimctional 
25 site to encompass 5, 10, 15, 20, 30, 40, 50, or up to 100 amino acids. Further, fragments 
can include sub-fragments of the specific sites or regions disclosed herein, which sub- 
fragments retain the fimction of the site or region from which they are derived. 

The invention also provides fragments with immimogenic properties. These 
contain an epitope-bearing portion of the sulfatase polypeptide and variants. These 
30 epitope-bearing peptides are useful to raise antibodies that bind specifically to a sulfatase 
polypeptide or region or fragment. These peptides can contain at least 10, 12, at least 14, 
or between at least about 15 to about 30 amino acids. The epitope-bearing sulfatase 
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polypeptides may be produced by any conventional means (Houghten, RA. (1985) 
Proc. Natl Acad Set USA 52:5131-5135). Simultaneous multiple peptide synthesis is 
described in U.S, Patent No. 4,631,211. 

Non-limiting examples of antigenic polypeptides that can be used to generate 
5 antibodies include but are not limited to peptides derived from extracellular regions. 
Regions having a liigh antigenicity index are shown in Figures 3, 7, 12, and 17. 
However, intracellularly-made antibodies ("intrabodies") are also encompassed, w^hich 
would recognize intracellular peptide regions. 

Fragments can be discrete (not fused to other amino acids or polypeptides) or can 
10 be within a larger polypeptide. Further, several fragments can be comprised within a 
single larger polypeptide. In one embodiment a fragment designed for expression in a 
host can have heterologous pre- and pro-polypeptide regions fiised to the amino terminus 
of the sulfatase polypeptide fragment and an additional region fiised to the carboxyl 
terminus of the fragment. 
1 5 The invention thus provides chimeric or ftision proteins. These comprise a 

sulfatase peptide sequence operatively linked to a heterologous peptide having an amino 
acid sequence not substantially homologous to the sulfatase polypeptide. "Operatively. 
linked" indicates that the sulfatase polypeptide and the heterologous peptide are fused in- 
fi^ame. The heterologous peptide can be fiised to the N-terminus or C-tenninus of the 
20 sulfatase polypeptide or can be internally located. 

In one embodiment the fiision protein does not affect sulfatase function jc>cr se. 
For example, the fiision protein can be a GST-fiision protein in which sulfatase 
sequences are fiised to the N- or C-terminus of the GST sequences. Other types of 
fiision proteins include, but are not limited to, enzymatic fiision proteins, for example 
25 beta-galactosidase fiisions, yeast two-hybrid GAL4 fijsions, poly-His fiisions and Ig 
fiisions. Such fiision proteins, particularly poly-His fiisions, can facilitate the 
purification of recombinant sulfatase polypeptide. In certain host cells (e.g., mammalian 
host cells), expression and/or secretion of a protein can be increased by using a 
heterologous signal sequence. Therefore, in another embodiment, the fiision protein 
30 contains a heterologous signal sequence at its C- or N-temiinus. 

EP-A-O 464 533 discloses fiision proteins comprising various portions of 
immunoglobulin constant regions. The Fc is usefiil in therapy and diagnosis and thus 
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resiilts, for example, in improved phamiacokinetic properties (EP-A 0232 262). In dnig 
discovery, for example, human proteins have been fused with Fc portions for the purpose 
of high-throughput screening assays to identify antagonists (Bennett et al (1995) J, Mol 
Recog, (5:52-58 (1995) and Johanson et alJ. Biol. Chem. 270:9459-9471). Thus, this 

5 invention also encompasses soluble fusion proteins containing a sulfatase polypeptide 
and various portions of the constant regions of heavy or Hght chains of immunoglobulins 
of various subclass (IgG, IgM, IgA, IgE). Preferred as immvmoglobuUn is the constant 
part of the heavy chain of human IgG, pailicularly IgGl, where fusion takes place at the 
hinge region. For some uses it is desirable to remove the Fc after the fusion protein has 
1 0 been used for its mtended purpose, for example when the fusion protein is to be used as 
antigen for immunizations. In a particular embodiment, the Fc part can be removed in a 
simple way by a cleavage sequence, which is also incorporated and can be cleaved with 
factor Xa. 

A chimeric or fusion protein can be produced by standard recombinant DNA 

1 5 techniques. For example, DNA fragments coding for the different protein sequences are 
ligated together in-frame in accordance with conventional techniques. In another 
embodiment, the fusion gene can be synthesized by conventional techniques including 
automated DNA synthesizers. Alternatively, PGR amplification of gene fragments can 
be carried out using anchor primers which give rise to complementary overhangs 

20 between two consecutive gene fragments which can subsequently be annealed and re- 
amplified to generate a chimeric gene sequence (see Ausubel et ah (1992) Current 
Protocols in Molecular Biology), Moreover, many expression vectors are commercially 
available that already encode a fusion moiety (e.g., a GST protein). A sulfatase- 
encoding nucleic acid can be cloned into such an expression vector such that the fusion 

25 moiety is linked in-frame to sulfatase. 

Another form of fusion protein is one that directly affects sulfatase functions. 
Accordingly, a sulfatase polypeptide is encompassed by the present invention in which 
one or more of the sulfatase regions (or parts thereof) has been replaced by heterologous 
or homologous regions (or parts thereof) from another sulfatase. Accordingly, various 

30 permutations are possible, for example, as discussed above. Thus, chimeric sialfatases 
can be formed in which one or more, of the native domains or subregions has been 
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duplicated, removed, or replaced by another. This includes but is not limited to catalytic 
sulfatase or substrate binding domains, and regions involved in activation. 

It is understood however that such regions could be derived from a sulfatase that 
has not yet been characterized. Moreover, sulfatase function can be derived from 
5 peptides that contain these functions but are not in a sulfatase family. 

The isolated 22438 sulfatase protein can be purified from cells that naturally 
express it, such as DRG neurons, including small and medium sized neurons, spinal 
cord, including intemeurons and motor neurons, and brain, especially purified from cells 
that have been altered to express it (recombiaant), or synthesized using known protein 
1 0 syntiiesis methods. 

The isolated 23553 sulfatase protein can be purified from cells that naturally 
express it, such as cells from any of the tissues shown in Figures 9 and 21-26, especially 
purified from cells that have been altered to ejq^ress it (recombinant), or synthesized 
using known protein synthesis methods. 
1 5 The isolated 25278 sulfatase protein can be purified fr om cells that naturally 

express it, such as cells from any of the tissues shown in Figures 14 and 28, especially 
purified from cells that have been altered to express it (recombinant), or synthesized 
xising known protein synthesis methods. 

The isolated 26212 sulfatase protein can be purified from cells that naturally 
20 express it, such as cells from any of the tissues shown in Figures 30-36, especially 
purified from cells that have been altered to express it (recombinant), or synthesized 
using known protein synthesis methods. 

In one embodiment, tlie protein is produced by recombinant DNA techniques. 
For example, a nucleic acid molecule encoding the sulfatase polypeptide is cloned into 
25 an expression vector, the expression vector introduced into a host cell and the protein 
expressed in the host cell. The protein can then be isolated from the cells by an 
appropriate purification scheme using standard protein purification techniques. 

Polypeptides often contain amino acids other than the 20 amino acids commonly 
referred to as the 20 naturally-occurring amino acids. Further, many amino acids, 
30 including the terminal amino acids, may be modified by natural processes, such as 
processing and other post-translational modifications, or by chemical modification 
techniques well known in the art. Common modifications that occur naturally in 
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polypeptides are described in basic texts, detailed monographs, and the research 
literature, and they are well known to those of skill in the art. 

According^, the polypeptides also encompass derivatives or analogs in wliich a 
substituted amino acid residue is not one encoded by the genetic code, in which a 
5 substituent group is included, in which the mature polypeptide is fused with another 

compound, such as a compound to increase the half-life of the polypeptide (for example, 
polyethylene glycol), or in wliich the additional amino acids are fused to the mature 
polypeptide, such as a leader or secretory sequence or a sequence for purification of the 
mature polypeptide or a pro-protein sequence. 
10 ICnown modifications mclude, but are not limited to, acetylation, acylation, 

ADP-ribosylation, amidation, covalent attachment of flavin, covalent attachment of a 
heme moiety, covalent attachment of a nucleotide or nucleotide derivative, covalent 
attachment of a lipid or lipid derivative, covalent attachment of phosphatidylinositol, 
cross-linking, cychzation, disulfide bond formation, demethylation, formation of 
1 5 covalent crosslinks, formation of cystine, formation of pyroglutamate, formylation, 

gamma carboxylation, glycosylation, GPI anchor formation, hydroxylation, iodination, 
methylation, myristoylation, oxidation, proteolytic processing, phosphorylation, 
prenylation, racemization, selenoylation, sulfation, transfer-RNA mediated addition of 
amino acids to proteins such as arginylation, and ubiquitination. 
20 Such modifications are well-known to those of skill in the art and have been 

described in great detail in the scientific literature. Several particularly common 
modifications, glycosylation, lipid attachment, sulfation, gamma-carboxylation of 
glutamic acid residues, hydroxylation and ADP-ribosylation, for instance, are described 
in most basic texts, such as Proteins - Structure and Molecular Properties, 2nd ed., T.E. 
25 Creighton, W. H. Freeman and Company, New York (1993). Many detailed reviews are 
available on this subject, such as by Wold, F., Posttranslational Covalent Modification 
of Proteins, B.C. Johnson, Ed., Academic Press, New York 1-12 (1983); Seifter et aL 
(1990) Meth. Enzymol 182: 626-646) and Rattan et aL (1992) Ann. K Y. Acad. Set 
5(53:48-62). 

30 As is also well known, polypeptides are not always entirely linear. For instance, 

polypeptides may be branched as a result of ubiquitination, and they may be circular, 
with or without branching, generally as a result of post-translation events, including 
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natural processing events and events brought about by hximan manipulation which do not 
occur naturally. Circular, branched and branched circular polypeptides may be 
synthesized by non-translational natural processes and by synthetic methods. 

Modifications can occur anyv^here in a polypeptide, including the peptide 

5 backbone, the amino acid side-chains and the amino or carboxyl termini. Blockage of 
tlie amino or carboxyl group in a polypeptide, or both, by a covalent modification, is 
common in naturally-occurring and synthetic polypeptides. For instance, the 
aminoterminal residue of polypeptides made in E. colU prior to proteolytic processing, 
ahnost invariably will be N-formylmethionine. 

1 0 The modifications can be a function of how the protein is made. For 

recombinant polypeptides, for example, the modifications will be determined by the host 
cell posttranslational modification capacity and tiie modification signals in tiie 
polypeptide amino acid sequence. Accordingly, when glycosylation is desired, a 
polypeptide should be expressed in a glycosylating host, generally a eukaiyotic ceU. 

1 5 Insect cells often carry out the same posttranslational glycosylations as mammalian cells 
and, for this reason, insect cell expression systems have been developed to efficiently 
express mammalian proteins having native patterns of glycosylation. Similar 
considerations apply to other modifications. 

The same type of modification may be present in the same or varying degree at 

20 several sites in a given polypeptide. Also, a given polypeptide may contain more than 
one type of modification. 

Polypeptide Uses 

The protein sequences of tiie present invention can be used as a "query 
25 sequence" to perfoim a search against public databases to, for example, identify other 
family members or related sequences. Such searches can be performed using the 
NBLAST and XBLAST programs (version 2.0) of Altschul et al (1990) J. Mol Biol 
21 5:403-10. BLAST nucleotide searches can be performed with the NBLAST 
program, score = 100, wordlength = 12 to obtain nucleotide sequences homologous to 
30 the nucleic acid molecules of the invention. BLAST protein searches can be 

performed with the XBLAST program, score = 50, wordlength = 3 to obtain amino 
acid sequences homologous to the proteins of the invention. To obtain gapped 
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alignments for comparison purposes. Gapped BLAST can be utilized as described in 
Altschul et al, (1997) Nucleic Acids Res. 25(17):3389-3402. When utilizing BLAST 
and Gapped BLAST programs, the default parameters of the respective programs 
(e.g., XBLAST and NBLAST) can be used. See http://www.ncbi>nlm.nih.gov . 
5 Sulfatase polypeptides are useful for producing antibodies specific for sulfatase, 

regions, or fiagnients. Regions having a high antigenicity index score are shown in 
Figures 3, 7, 12, and 17. 

Sulfatase polypeptides are useful for biological assays related to sulfatases. Such 
assays involve any of the known sulfatase functions or activities or properties useful for 
1 0 diagnosis and treatment of sxilfatase-related conditions, including those in the references 
cited herein, which are incorporated by reference for these assays, functions, and 
disorders. 

These assays include, but are not limited to, binding to and/or cleaving specific 
substrates to produce fi-agments, steady state levels of sulfated compounds, cysteine 

1 5 modification, and biological assays related to the functions produced by sulfated 

compounds. Specific substrates useful for assays related to sulfate conjugate hydrolysis 
include but aie not limited to xenobiotics, thyroid homiones, steroids, and catechols. 
Specific sulfate conjugates include, but are not limited to, 3a-sulfatolithocholyltaurine, 
sulfate conjugates of estrone, 4-methylumbelliferone, and harmol, sulfated cartilage and 

20 proteoglycans, 4-nitrophenol, simple phenols, hydroxyarylamines, iodothyronines, 
catecholamines, 1-naphfhyl, salbutamol, estrogens, ethinylestradiol, equilenin, 
diethylstilbestrol, androgens, cholesterol bile salts, pregnenolone, benzylic alcohols, 
glycolipidsulfates, complex carbohydrates such as dermatan and chondrotin sulfate, 
steroid sulfate, sulfate conjugates of xenobiotics, cholesterol sxdfate, xenobiotic phenyls, 

25 o-cresol, vanillan, eugenol, m-cresol, thymol, ethyM,4-dihydroxybenzoate, p-cresol, 
sesamol, methyl-2,6-dihydroxy-4-methylbenzyloate, methyl-2,4-dihydroxybenzoate, 
metliyl-3,5-dihydroxybenzoate, tyramine, dopamine, 5 hydroxytryptamine, pyrogallol, 
4-nitrocatecholsulfate, estrone sulfate, metabolites of the cytochrome P450 mono- 
oxygenase system, dihydroepiandrosterone sulfate (DHEAS), minoxidil, cicletanine, 

30 sulfated mutagens and carcinogens, such as aromatic amines (including heterocyclic 
amines), and benaylic alcohols of chemicals such as polycyclic aromatic hydrocarbons, 
saffrole and estragole, glycosaminoglycans, sulfolipids, betahydroxysteroids, sulfete 
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esters of chromogenic or fluorogenic aromatic conipoimds, cerebroside sulfate, keritan 
sulfate, and heparan sulfate. Substrates also include any in the references cited herein, 
which are incorporated herein by reference for these substrates. Accordingly the assays 
include, but are not limited to, these sulfated substrates and biological effects of sulfation 
5 or desulfation of these substrates and associated biochemical, cellular, or phenotypic 

effects of sulfation of desulfation, and any of the other biological or functional properties 
of these proteins, including, but not limited to, those disclosed herein, and in any 
reference cited herein which is mcorporated herein by reference for the disclosure of 
these properties and for the assays based on these properties. Further, assays may relate 
10 to changes in the protein, per se^ and on the effects of these changes, for example, 
activation of the sulfatase by modification of a cysteme residue as disclosed herein, 
cleavage of the propeptide by a proteinase, induction of expression of the protein in vivo, 
inhibition of fimction, as well as any other effects on tiie protein mentioned herein or 
cited in any reference herein, which are incorporated herein by reference for these effects 
1 5 and for the subsequent biological consequences of these effects. 

Sulfatase polypeptides are also useful in drug screening assays, in cell-based or 
cell-free systems. Cell-based systems can be native, i.e., cells that normally express 
sulfatase, such as those discussed above, especially tumor cells, as a biopsy, or expanded 
in cell culture. In one embodiment, however, cell-based assays involve recombinant 
20 host cells expressing sulfatase. Accordmgly, these dmg-screening assays can be based 
on effects on protein function as described above for biological assays useful for 
diagnosis and treatment. 

Detemiining the ability of the test compoimd to interact with a sulfatase can also 
comprise determining the ability of the test compound to preferentially bind to the 
25 polypeptide as compared to the ability of a known binding molecule to bind to the 
polypeptide. 

The polypeptides can be used to identify compounds that modulate sulfatase 
activity. Such compounds, for example, can increase or decrease affinity or rate of 
binding to substrate, compete witii substrate for binding to sulfatase, or displace substrate 
30 bound to sulfatase. Both sulfatase and appropriate variants and fragments can be used in 
high-throughput screens to assay candidate compounds for the ability to bind to 
sulfatase. These compounds can be fiirther screened against a functional sulfatase to 

-37- 



3NSDOCID: <WO 015541 1A2J_> 



wo 01/5541 1 PCT/USOl/03266 
detemiine the effect of the compound oh sulfatase activity. Compomds can be 
identified that activate (agonist) or inactivate (antagonist) sulfatase to a desired degree. 
Modulatory methods can be performed in vitro (e.g., by culturing the cell with the agent) 
or, alternatively, in vivo (e.g., by administering the agent to a subject). 
5 Sulfatase polypeptides can be used to screen a compound for the ability to 

stimulate or inhibit interaction between sulfatase protein and a target molecule that 
normally interacts with the sulfatase, for example, substrate of the sulfatase domain. 
The assay includes the steps of combining sulfatase protein with a candidate 
compoxmd imder conditions that allow the sulfatase protein or firagment to interact 
1 0 with the target molecule, and to detect the formation of a complex between the 
sulfatase protein and the target or to detect the biochemical consequence of the 
interaction with the sulfatase and the target. 

Detemiining the abiUty of the sulfatase to bind to a target molecule can also be 
accomplished using a technology such as real-time Bimolecular Interaction Analysis 
15 (BIA). Sjolander et ah (1991) Anal Chenv (55:2338-2345 and Szabo et ah (1995) 
Ciirr. Opin. Stj'iict Biol 5:699-705. As used herein, "BIA" is a technology for 
studying biospecific interactions in real time, without labeling any of the interactants 
(e.g., BIAcore™). Changes in the optical phenomenon surface plasmon resonance 
(SPR) can be used as an indication of real-time reactions between biological 
20 molecules. 

The test compounds of the present invention can be obtained using any of the 
numerous approaches in combinatorial library methods known in tlie art, including: 
biological libraries; spatially addressable parallel solid phase or solution phase 
Ubraries; synthetic library methods requiring deconvolution; the 'one-bead one- 
25 compound' library method; and synthetic library methods using aflSnity 

chromatography selection. The biological library approach is limited to polypeptide 
libraries, while the other four approaches are appUcable to polypeptide, non-peptide 
oligomer or small molecule libraries of compounds (Lam, K.S. (1997) Anticancer 
DrugDes. 72:145). 

30 Examples of methods for the synthesis of molecular libraries can be fovmd in 

the art, for example in DeWitt et al (1993) Proc. Natl Acad. Set USA 90:6909; Erb 
et al (1994) Proc. Natl Acad, Scl USA 91:1 1422; Zuckermann et al (1994). J. Med. 
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Chein, 57:2678; Cho etaL (1993) Science 2(57:1303; Carell etaL (199^) Aitgew. 
Chem. Int Ed. Engl 55:2059; Carell et al. {\99^) Angew, Chem. Int. Ed. Engl 
55:2061; and in Gallop et al (1994) J. Med. Che?n. 57:1233. Libraries of compounds 
may be presented in solution (e.g., Houghten (1992) Biotechniques 75:412-421), or on 
5 beads (Lam (1991) Nature 55¥:82-84), chips (Fodor (1993) Nature 364:555-556), 
bacteria (Ladner USP 5,223,409), spores (Ladner USP '409), plasmids (Cull et al 
(1992) Proc. Natl Acad, ScL USA SP:1865-1869) or on phage (Scott and Smith 
(1990) Science 249:386-390); (Devlin (1990) Science 249:404-406); (Cwirla et al 
(1990) Proc. Natl Acad ScL 97:6378-6382); (Felici (1991) J. Mol Biol 222:301- 
10 310); (Ladner supra). 

Candidate compounds include, for example, 1) peptides such as soluble peptides, 
including Ig-tailed fusion peptides and members of random peptide libraries (see, e.g.. 
Lam et al (1991) Nature 554:82-84; Houghten et al (1991) Nature 554:84-86) and 
combinatorial chemistry-derived molecular libraries made of D- and/or L- configuration 
1 5 amino acids; 2) phosphopeptides (e.g., members of random and partially degenerate, 

directed phosphopeptide libraries, see, e.g., Songyang etal (1993) Cell 72:767-778); 3) 
antibodies (e.g., polyclonal, monoclonal, humanized, anti-idiotypic, chimeric, and single 
chain antibodies as well as Fab, F(ab')2, Fab expression library fragments, and epitope- 
binding fragments of antibodies); 4) small organic and inorganic molecules (e.g., 
20 molecules obtained from combinatorial and natural product libraries); substrate analogs 
including, but not limited to, substrates disclosed herein. 

One candidate compound is a soluble fiill-length sulfatase or fragment that 
competes for substrate. Other candidate compounds include mutant sulfatases or 
appropriate fragments containing mutations that affect sulfatase fimction and compete 
25 for substrate. Accordingly, a fragment that competes for substrate, for example with a 

higher affinity, or a fragment that binds substrate but does not process or otherwise affect 
it, is encompassed by the invention. 

The invention provides other end points to identify compounds that modulate 
(stimulate or inhibit) sulfatase activity. The assays typically mvolve an assay of cellular 
30 events that indicate sulfatase activity. Thus, the expression of genes that are up- or 

down-regulated in response to sulfatase activity can be assayed. In one embodiment, the 
regulatory region of such genes can be operably linked to a marker that is easily 
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detectable, such as luciferase. Alternatively, modification of the sulfatase could also be 
measured. 

Any of the biological or biochemical functions mediated by ttie sulfatase can be 
used as an endpoint assay. These include any of the biochemical or 

5 biochemical/biological events described herein, in any reference cited herem, 

incorporated by reference for these endpoint assay targets, and other functions known to 
those of ordinary skill in the art. Specific end points can include, but aie not limited to, 
the events resulting from expression (or lack thereof) of sulfatase activity. With respect 
to disorders, this would include, but not be limited to, effects on function, 

10 differentiation, and proliferation, which can be assayed, as well as the biological 
effects of function, such as disorders discussed hereinabove and in the references 
cited hereinabove which are incorporated herein by reference for the disorders 
disclosed in thbse references and other disorders and pathology. In the case of the 
22438 sulfatase, models of pain can be used as an end point. In the case of the 23553 

1 5 and 25278 sulfatases, tumor progression can be used as an end point. In the case of 
the 26212 sulfatase, tumor angiogenesis and/or tumor progression can be used as an 
end point. 

Binding and/or activating compounds can also be screened by using chimeric 
sulfatase proteins in which one or more regions, segments, sites, and the like, as 

20 disclosed herein, or parts thereof, can be replaced by heterologous and homologous 

coimterparts derived from other sulfatases. For example, a catalj^c region can be used 
that interacts with a different substrate specificity and/or affinity than the native 
sulfatase. Accordingly, a different set of components is available as an end-point assay ^ 
for activation. As a further altemative, the site of modification by an effector protein, for 

25 example, activation or phosphorylation, can be replaced with the site for a different 

effector protein. Activation can also be detected by a reporter gene containing an easily 
detectable coding region operably linked to a transcriptional regulatory sequence that is 
part of the native pathway in which siilfatase is involved. 

Sulfatase polypeptides ai'e also useful in competition bindhig assays in methods 

30 designed to discover compoimds that interact with the sulfatase. Thus, a compound is 
exposed to a sulfatase polypeptide under conditions that aUow the compound to bind or 
to otherwise interact with the polypeptide. Soluble sulfatase polypeptide is also added to 
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the mixture. If tibe test compound interacts with the soluble sulfatase polypeptide, it 
decreases tiie amount of complex formed or activity from the svilfatase target. This type 
of assay is particularly useful in cases in which compounds are sought that interact with 
specific regions of the sulfatase. Thus, tlie soluble polypeptide that competes witii tlie 

5 target sulfatase region is designed to contain peptide sequences corresponding to the 
region of interest. 

Another type of competition-binding assay can be used to discover compounds 
that interact with specific fiinctional sites. As an example, bindable substrate analog and 
a candidate compound can be added to a sample of the sulfatase. Compoxands that 
1 0 interact witli the sulfatase at the same site as the substrate or analog will reduce the 
amount of complex formed between the sulfatase and the substrate or analog. 
Accordingly, it is possible to discover a compound that specifically prevents interaction 
between the sulfatase and the component. Another example involves adding a candidate 
compound to a sample of sulfatase and cleavable substrate. A compound that competes 
1 5 with the substrate will reduce the amount of hydrolysis or binding of the substrate to the 
sulfatase. Accordingly, compounds can be discovered that directly interact with the 
sulfatase and compete with the substrate. Such assays can involve any other component 
that mteracts with tlie sulfatase. 

To perform cell firee drug screening assays, it is desirable to immobilize either 
20 sulfatase, or firagment, or its target molecule to facilitate separation of complexes firom 
uncomplexed forms of one or both of the proteins, as well as to accommodate 
automation of the assay. 

Techniques for immobilizing proteins on matrices can be used in the drug 
screening assays. In one embodiment, a fusion protein can be provided which adds a 
25 domain that allows the protein to be bound to a matrix. For example, glutathione-S- 
transferase/sulfatase fusion proteins can be adsorbed onto glutathione sepharose beads 
(Sigma Chemical, St. Louis, MO) or glutathione derivatized microtitre plates, which are 
then combined with the cell lysates (e.g., -^^S-labeled) and the candidate compound, and 
the mixture incubated under conditions conducive to complex formation (e.g., at 
30 physiological conditions for salt and pH). Following incubation, the beads are washed to 
remove any unbound label, and the matrix immobilized and radiolabel determined 
directly, or in the supernatant after the complexes is dissociated. Alternatively, the 
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complexes caii be dissociated from the matrix, separated by SDS-PAGE, and the level of 
sulfatase-binding protein found in the bead fraction quantitated from the gel using 
standard electrophoretic techniques. For example, either the polypeptide or its target 
molecule can be immobilized utilizing conjugation of biotin and streptavidin using 
5 tecliniques well known in tlie art. Alternatively, antibodies reactive with the protein but 
which do not interfere with binding of the protein to its target molecule can be 
derivatized to the wells of the plate, and the protein trapped in the wells by antibody 
conjugation. Preparations of a sulfatase-binding target component, such as substrate or 
activating enzyme, and a candidate compound are incubated in sulfatase-presenting 
1 0 wells and the amount of complex trapped in the well can be quantitated. Methods for 

detecting such complexes, in addition to tliose described above for the GST-immobilized 
complexes, include immunodetection of complexes using antibodies reactive with the 
sulfatase target molecule, or which are reactive with the sulfatase and compete with the 
target molecule; as well as enzyme-linked assays which rely on detecting an enzymatic 
1 5 activity associated with the target molecule. 

Modulators of sulfatase activity identified according to these dmg screening 
assays can be used to treat a subject with a disorder related to the sulfatase, by treating 
cells that express the sulfatase. These methods of treatment include the steps of 
administering the modulators of sulfatase activity in a pharmaceutical composition as 
20 described herein, to a subject in need of such treatment. 

The 23553, 25278, and 26212 sulfatases are differentially expressed in tumor 
cells as disclosed herein. Accordingly, these sulfatases are relevant to these disorders 
and relevant as well to differentiation, fimction, and growth of the tissues giving rise to 
the tumors. The 22438 sulfatase is expressed as described above, and accordingly is 
25 relevant for disorders involving tiiese tissues. Disorders include, but are not limited to, 
those discussed hereinabove. Moreover, since the gene is expressed in the central 
nervous system, this sulfatase is relevant for the treatoent of pain. 

Sulfatase polypeptides are thus useful for treating a sulfatase-associated disorder 
characterized by aberrant expression or activity of a sulfatase. "Aberrant expression" 
30 or "misexpression", as used herein, refers to a non-wild type pattern of gene 

expression, at the RNA or protein level. It includes: expression at non-wild type 
levels, i.e., over or under expression; a pattern of expression that differs from wild 
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type in terms of the time or stage at which the gene is expressed, e.g., increased or 
decreased expression (as compared with wild type) at a predetermined developmental 
period or stage; a pattern of expression that differs from wild type in terms of 
decreased expression (as compared with wild type) in a predetermined cell type or 
5 tissue type; a pattern of expression that differs from wild type in terms of the splicing 
size, amino acid sequence, post-transitional modification, or biological activity of the 
expressed polypeptide; a pattern of expression that differs from wild type in terms of 
the effect of an envirormiental stimulus or extracellular stimulus on expression of the 
gene, e.g., a pattem of increased or decreased expression (as compai ed with wild 
10 type) in the presence of an increase or decrease in the strength of the stimulus. 

In one embodiment, the method involves administering an agent (e.g., an 
agent identified by a screening assay described herein), or combination of agents that 
modulates (e.g., upregulates or dowiuregulates) expression or activity of the protein. 
In another embodiment, the method involves administering sulfatase as therapy to 
1 5 compensate for reduced or aberrant expression or activity of the protein. 

Methods for treatment include but are not limited to the use of soluble sulfatase 
or fragments of sulfatase protein tiiat compete for substrate or any other component that 
directly interacts with sulfatase, or any of the enzymes that modify the sulfatase. These 
sulfatases or fragments can have a higher affinity for the target so as to provide effective 
20 competition. 

Stimulation of activity is desirable in situations in which the protein is 
abnormally downregulated and/or in which increased activity is likely to have a 
beneficial effect. Likewise, inhibition of activity is desirable in situations in which 
the protein is abnormally upregulated and/or in which decreased activity is likely to 
25 have a beneficial effect. In one example of such a situation, a subject has a disorder 
characterized by aberrant development or cellular differentiation. In another example, 
the subject has a disorder characterized by an aberrant hematopoietic response. In 
another example, it is desirable to achieve tissue regeneration m a subject. 

In yet another aspect of the invention, the proteins of the invention can be used 
30 as "bait proteins" in a two-hybrid assay or three-hybrid assay (see, e.g., U.S. Patent 
No. 5,283,317; Zervos et al (1993) Cell 72:223-232; Madura et al (1993) J. Biol 
Chem. 255:12046-12054; Bartel et al (1993) Biotechniques 7^:920-924; Iwabuchi et 
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al (1993) Oncogene 5:1693-1696; and Brent WO 94/10300), to identify other 
proteins (captured proteins) which bind to or interact with the proteins of the 
invention and modulate their activity. 

Sulfatase polypeptides also are useM to provide a target for diagnosing a disease 
5 or predisposition to disease mediated by the sulfatase, including, but not limited to, those 
diseases disclosed herein, in the references cited herein, and as disclosed above in the 
backgroxHid. Accordingly, methods are provided for detecting the presence, or levels of 
the sulfatase in a cell, tissue, or organism. The method involves contacting a biological 
sample with a compound capable of interacting with the sulfatase such that the 
1 0 interaction can be detected. One agent for detecting a sulfatase is an antibody capable of 
selectively binding to the sulfatase. A biological sample includes tissiies, cells and 
biological fluids isolated from a subject, as well as tissues, cells and fluids present within 
a subject. 

The sulfatase also provides a target for diagnosing active disease, or 
1 5 predisposition to disease, in a patient having a variant sulfatase. Thus, sulfatase can be 
isolated from a biological sample and assayed for the presence of a genetic mutation that 
results in an aberrant protein. This includes amino acid substitution, deletion, insertion, 
rearrangement, (as the result of aberrant splicing events), and inappropriate post- 
translational modification. Analytic methods include altered electrophoretic mobility, 
20 altered tryptic peptide digest, altered sulfatase activity in cell-based or cell-free assays, 
such as by alteration in substrate binding or degradation, or ability to be activated by the 
activation enzyme, or antibody-binding pattern, altered isoelectric point, direct amino 
acid sequencing, and any other of the known assay techniques useftil for detecting 
mutations in a protein in general or in a sulfatase specifically, such as aie disclosed 
25 herein. 

In vitro techniques for detection of sulfatase include enzyme linked 
immunosorbent assays (ELIS As), Western blots, immimoprecipitations and 
immunofluorescence. Altematively, the protein can be detected in vivo in a subject by 
introducing into the subject a labeled anti-sulfatase antibody. For example, the antibody 
30 can be labeled with a radioactive marker whose presence and location in a subject can be 
detected by standard imaging techniques. Particularly useful are methods, which detect 
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die allelic variant of sulfatase expressed in a subject, and methods, which detect 
fragments of sulfatase in a sample. 

Sulfatase polypeptides are also useful in pharmacogenomic analysis. 
Pharmacogenomics deal with clinically significant hereditary variations in the response 
5 to drugs due to altered drug disposition and abnomial action in affected persons. See, 
e.g., Eichelbaum, M. (1996) Clin. Exp. Pharmacol Physiol 25(7 0-77;:983-985, and 
Linder, M.W. (1997) Clin. Chem. 43f2;;254-266. The cUnical outcomes of these 
variations result in severe toxicity of therapeutic dmgs in certain individuals or 
therapeutic failure of dmgs in certain individuals as a result of individual variation in 
1 0 metabolism. Thus, the genotype of the individual can determine the way a therapeutic 
compound acts on the body or the way the body metabolizes the compound. Further, the 
activity of drug metabolizing enzymes affects both the intensity and duration of dmg 
action. Thus, the pharmacogenomics of the individual permit the selection of effective 
compounds and effective dosages of such compounds for prophylactic or therapeutic 
1 5 treatment based on the individual's genotype. The discovery of genetic polymorphisms 
in some dmg metabolizing enzymes has explained why some patients do not obtain the 
expected drag effects, show an exaggerated drag effect, or experience serious toxicity 
from standard drug dosages. Polymorphisms can be expressed in the phenotype of the 
extensive metabolizer and the phenotype of the poor metabolizer. Accordingly, genetic 
20 polymorphism may lead to alleUc protein variants of sulfatase in which one or more of 
sulfatase functions in one population is different from those in another population. The 
polypeptides thus allow a target to ascertain a genetic predisposition that can affect 
treatment modality. Thus, in a peptide-based treatment, polymorphism may give rise to 
catalytic regions that are more or less active. Accordingly, dosage would necessarily be 
25 modified to maximize the therapeutic effect within a given population containing the 
polymorphism. As an alternative to genotyping, specific polymorphic polypeptides 
could be identified. 

Sulfatase polypeptides are also useful for monitoring therapeutic effects during 
clinical trials and other treatment. Thus, the therapeutic effectiveness of an agent that is 
30 designed to increase or decrease gene expression, protein levels or sulfatase activity can 
be monitored over the course of treatment using sulfatase polypeptides as an end-point 
target. The monitoring can be, for example, as follows: (i) obtaining a pre- 
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administration sample from a subject prior to administration of the agent; (ii) 
detecting the level of expression or activity of the protein in the pre-admiiiistration 
sample; (iii) obtaining one or more post-administration samples from the subject; (iv) 
detecting the level of expression or activity of the protein in the post-administration 

5 samples; (v) compaiing the level of expression or activity of the protein in the pre- 
administration sample with the protein in the post-administration sample or samples; 
and (vi) increasing or decreasing the administration of the agent to tlie subject 
accordingly. 



10 Antibodies 

The invention also provides antibodies that selectively bind to the sulfatase and 
its variants and fragments. An antibody is considered to selectively bind, even if it also 
binds to other proteins that ai-e not substantially homologous with the sulfatase. These 
other proteins share homology with a fr agment or domain of sulfatase. This 

15 conservation in specific regions gives rise to antibodies that bind to botli proteins by 
virtue of the homologous sequence. In this case, it vs^ould be understood that antibody 
binding to the sulfatase is still selective. 

Antibodies can be polyclonal or monoclonal. An intact antibody, or a fragment 
thereof (e.g. Fab or F(ab')2) can be used. An appropriate immunogenic preparation can 

20 be derived from native, recombinantly expressed, or chemically synthesized peptides. 
To generate antibodies, an isolated sulfatase polypeptide is used as an 
immunogen to generate antibodies using standard techniques for polyclonal and 
monoclonal antibody preparation. Either the ftilHength protein or antigenic peptide 
fragment can be used. Regions having a high antigenicity index are disclosed 

25 hereinabove. 

Antibodies are preferably prepared from these regions or from discrete 
fragments in these regions. However, antibodies can be prepared from any region of 
the peptide as described herein. A preferred fragment produces an antibody that 
diminishes or completely prevents substrate hydrolysis or binding. Antibodies can be 

30 developed against the entire sulfatase or domains of the sulfatase as described herein, 
for example, tiie substrate binding region, sulfatase motif, or subregions thereof 
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Antibodies can also be developed against other specific flinctional sites as disclosed 
herein. 

The antigenic peptide can comprise a contiguous sequence of at least 12, 14, 15- 
20, 20-25, or 25-30 or more amino acid residues. In one embodiment, fragments 

5 correspond to regions that are located on the surface of the protein, e.g., hydrophilic 
regions. These fragments are not to be construed, however, as encompassing any 
fragments, which may be disclosed prior to tlie invention. 

Detection can be facilitated by coupling (i.e., physically linking) the antibody to 
a detectable substance. Examples of detectable substances include various enzymes, 

1 0 prosthetic groups, fluorescent materials, luminescent materials, bioluminescent 

materials, and radioactive materials. Examples of suitable enzymes include horseradish 
peroxidase, alkaline phosphatase, p-galactosidase, or acetylcholinesterase; examples of 
suitable prosthetic group complexes include streptavidin/biotin and avidin/biotin; 
examples of suitable fluorescent materials include umbelliferone, fluorescein, 

1 5 fluorescein isothiocyanate, rhodamine, dichlorotriazinylamine fluorescein, dansyl 
chloride or phycoerythrin; an example of a luminescent material includes luminol; 
examples of bioluminescent materials include luciferase, luciferin, and aequorin, and 
examples of suitable radioactive material include ^^^I, ^^^I, ^^S or ^H. 



20 Antibody Uses 

The antibodies can be used to isolate a sulfatase by standard techniques, such as 
affinity chromatography or immunoprecipitation. The antibodies can facilitate the 
purification of the natural sulfatase from cells and recombinantly produced sulfatase 
expressed in host cells. 

25 The antibodies are useful to detect the presence of a sulfatase in cells or tissues to 

determine the pattern of expression of the sulfatase among various tissues in an organism 
and over the course of normal development. The antibodies can be used to detect a 
sulfatase in situ, in vitro^ or in a cell lysate or supernatant in order to evaluate the 
abundance and pattern of expression. Antibody detection of circulating fragments of the 

30 full length sulfatase can be used to identify sulfatase turnover. In addition, the antibodies 
can be used to assess abnormal tissue distribution or abnonnal expression during 
development. 
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Further, the antibodies can be used to assess sulfatase expression in disease states 
such as in active stages of the disease or in an individual with a predisposition toward 
disease related to sulfatase function. When a disorder is caused by an inappropriate 
tissue distribution, developmental expression, or level of expression of sulfatase protein, 
5 the antibody can be prepared against the nonnal sulfatase protein. If a disorder is 
characterized by a specific mutation in sulfatase, antibodies specific for this mutant 
protein can be used to assay for the presence of the specific mutant sulfatase. However, 
intracellularly-made antibodies ("intrabodies") are also encompassed, which would 
recognize intracellular sulfatase peptide regions. 
1 0 The antibodies can also be used to assess iiomial and aberrant subcellular 

localization of cells in the various tissues in an organism. Antibodies can be developed 
against the whole sulfatase or poitions of the sulfatase. 

The diagnostic uses can be applied, not only in genetic testing, but also in 
monitoring a treatment modality. Accordingly, where treatment is ultimately aimed at 
1 5 conecting sulfatase expression level or the presence of aberrant sulfatases and aberrant 
tissue distribution or developmental expression, antibodies directed against the sulfatase 
or relevant fi:agments can be used to monitor therapeutic efficacy. 

Additionally, antibodies are useful in pharmacogenomic analysis. Thus, 
antibodies prepared against polymorphic sulfatase can be used to identify iadividuals 
20 that require modified treatment modalities. 

The antibodies are also useful as diagnostic tools as an immunological marker 
for aberrant sulfatase analyzed by electrophoretic mobility, isoelectric point, tryptic 
peptide digest, and other physical assays known to those m the art. 

The antibodies are also useful for tissue typing. Thus, where a specific sulfatase 
25 has been correlated with expression in a specific tissue, antibodies that are specific for 
this sulfatase can be used to identify a tissue type. 

The antibodies are also useful in forensic identification. Accordingly, where an 
individual has been correlated with a specific genetic polymorphism resulting in a 
specific polymorphic protein, an antibody specific for the polymorphic protein can be 
30 used as an aid in identification. 

The antibodies are also usefiil for inhibiting sulfatase fimction, for example, 
substrate binding, or sulfatase activity. For example, sulfatase activity may be measured 
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by the ability to form a binding complex with a sulfated conjugate, such as disclosed 
herein. 

These vises can also be apphed in a thempeutic context in which treatment 

involves inhibiting sulfatase fiinction. An antibody can be used, for example, to block 
5 substrate binding. Antibodies can be prepared against specific fragments containing 

sites required for function or against intact sulfatase associated with a cell. 

Completely human antibodies are pailicularly desirable for therapeutic treatment 

of human patients. For an overview of this technology for producing human 

antibodies, see Lonberg et aL (1995) InL Rev. Immunol. 73:65-93. For a detailed 
1 0 discussion of this technology for producing human antibodies and human monoclonal 

antibodies and protocols for producing such antibodies, e.g., U.S.:Patent. 5,625, 126; 

U.S. Patent.5,633,425; U.S. Patent 5,569,825; U.S. Patent 5,661,016; and U.S. Patent 

5,545,806. 

The invention also encompasses kits for using antibodies to detect the presence 
15 of a sulfatase protein in a biological sample. The kit can comprise antibodies such as a 
labeled or labelable antibody and a compoimd or agent for detecting the sulfatase in a 
biological sample; means for determining the amoimt of sulfatase intlie sample; and 
means for comparing the amount of sulfatase in the sample with a standard. The 
compound or agent can be packaged in a suitable container. The kit can further 
20 comprise instructions for using the kit to detect the sulfatase. 

Polynucleotides 

The nucleotide sequences in SEQ ID NOS:2, 4, 6, and 8 were obtained by 
sequencing tlie deposited human cDNAs. Accordingly, the sequences of the deposited 
25 clones are controlling as to any discrepancies between the two and any reference to a 
sequence of SEQ ID NOS:2, 4, 6, or 8, includes reference to the sequence of the 
deposited cDNA, 

The specifically disclosed cDNA comprises flie coding region and 5' and 3' 
untranslated sequences in SEQ ID NOS:2, 4, 6, or 8. The coding sequences of the 
30 cDNA's are set forth in SEQ ID NOS:l 1, 12, 13, and 14. 

The invention provides isolated polynucleotides encoding the novel sulfatases. 
The temi "sulfatase polynucleotide" or "sulfatase nucleic acid" refers to the sequences 
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shown in SEQ ID NOS:2, 4, 6, 8, 1 1, 12, 13, or 14, or in the deposited cDNAs. The 
term "sulfatase polynucleotide" or "sulfatase nucleic acid" further includes variants and 
fragments of sulfatase polynucleotides. 

Generally, nucleotide sequence variants of the invention will have at least 
5 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 
98%, or 99% identity to one of the nucleotide sequences disclosed herein. 

An 'Isolated" sulfatase nucleic acid is one that is separated from other nucleic 
acid present in the natural source of sulfatase nucleic acid. Preferably, an "isolated" 
nucleic acid is free of sequences which naturally flank sulfatase nucleic acid (i.e., 
1 0 sequences located at the 5' and 3' ends of the nucleic acid) in the genomic DNA of the 
organism from which the nucleic acid is derived. However, there can be sonae flanking 
nucleotide sequences, for example up to about 5KB. The important point is that the 
sulfatase nucleic acid is isolated from flanking sequences such that it can be subjected to 
Hie specific manipulations described herein, such as recombinant expression, preparation 
1 5 of probes and primers, and other uses specific to the sulfatase nucleic acid sequences. In 
one embodiment, the sulfatase nucleic acid comprises only the coding region. 

Moreover, an "isolated" nucleic acid molecule, such as a cDNA or RNA 
molecule, can be substantially free of other cellular material, or culture medium when 
produced by recombinant techniques, or chemical precursors or other chemicals when 
20 chemically synthesized. However, the nucleic acid molecule can be ftised to other 
coding or regidatory sequences and still be considered isolated. 

In some instances, the isolated material will form part of a composition (for 
example, a crude extract containing other substances), buffer system or reagent mix. 
In other circumstances, the material may be purified to essential homogeneity, for 
25 example as determined by PAGE or column chromatography such as HPLC. 

Preferably, an isolated nucleic acid comprises at least about 50, 80 or 90% (on a 
molar basis) of all macromolecular species present. 

For example, recombinant DNA moleciiles contained in a vector are considered 
isolated. Further examples of isolated DNA molecules include recombinant DNA 
30 molecules maintained in heterologous host cells or purified (partially or substantially) 
DNA molecules in solution. Isolated RNA molecules include in vivo or in viti'o RNA 
transcripts of the isolated DNA molecules of the present invention. Isolated nucleic acid 
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molecules according to Hie present invention further include such molecules produced 
synthetically. 

In some instances, the isolated material will form part of a composition (or 
example, a crude extract containing other substances), buffer system or reagent mix. In 

5 other circumstances, the material may be purified to essential homogeneity, for example 
as determined by PAGE or column chromatography such as HPLC. Preferably, an 
isolated nucleic acid comprises at least about 50, 80 or 90% (on a molar basis) of all 
macromolecular species present. 

Sulfatase polynucleotides can encode the mature protein plus additional amino or 

10 carboxyterminal amino acids, or amino acids interior to the mature polypeptide (when 
the mature form has more than one polypeptide chain, for instance). Such sequences 
may play a role in processing of a protein from precursor to a mature form, facilitate 
protein trafficking, prolong or shorten protein half-life or facilitate manipulation of a 
protein for assay or production, among other things. As generally is the case in situ, the 

1 5 additional amino acids may be processed away from the mature protein by cellular 
enzymes. 

Sulfatase polynucleotides include, but are not limited to, the sequence encoding 
the mature polypeptide alone, the sequence encoding the mature polypeptide and 
additional coding sequences, such as a leader or secretory sequence (e.g., a pre-pro or 

20 pro-protein sequence), the sequence encoding the mature polypeptide, with or without 
the additional coding sequences, plus additional non-coding sequences, for example 
introns and non-coding 5' and 3' sequences such as transcribed but non-translated 
sequences that play a role in transcription, mRNA processing (including splicing and 
polyadenylation signals), ribosome binding and stability of mRNA. In addition, the 

25 polynucleotide may be fused to a marker sequence encoding, for example, a peptide that 
facilitates purification. 

Sulfatase polynucleotides can be in the form of RNA, such as mRNA, or in the 
form DNA, including cDNA and genomic DNA obtained by cloning or produced by 
chemical synthetic teclmiques or by a combination thereof. The nucleic acid, especially 

30 DNA, can be double-stranded or single-stranded. Single-stranded nucleic acid can be 
the coding strand (sense strand) or the non-coding strand (anti-sense strand). 
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The invention further provides variant sulfatase polynucleotides, and fragments 
thereof, tliat differ from the nucleotide sequence shown in SEQ ID NOS:2, 4, 6, 8, 11, 
12, 13, or 14 due to degeneracy of the genetic code and thus encode the same protein as 
that encoded by a nucleotide sequence shown in SEQ ID NOS:2, 4, 6, 8, 1 1, 12, 13, or 
5 14. 

Alternatively, a nucleic acid molecvile that is a fragment of a 22438-like 
nucleotide sequence of the present invention comprises a nucleotide sequence 
consisting of nucleotides 1-100, 100-200, 200-300, 300-400, 400-500, 500-600, 600- 
700, 700-900,900-1000, 1000-1100, 1100-1200, 1200-1300, 1300-1400, 1400-1500, 
10 1500-1600, 1600-1700, 1700-1800, 1800-1900, 1900-2000, 2000-2100, 2100-2175 of 
SEQIDN0:2. 

A nucleic acid molecule that is a fragment of a 23553-like nucleotide sequence 
of the present invention comprises a nucleotide sequence consisting of nucleotides 1- 
100, 100-200, 200-300, 300-400, 400-500, 500-600, 600-700, 700-900, 900-1000, 

15 1000-1100, 1100-1200, 1200-1300, 1300-1400, 1400-1500, 1500-1600, 1600-1700, 
1700-1800, 1800-1900, 1900-2000, 2000-2100, 2100-2200, 2200-2300, 2300-2400, 
2400-2500, 2500-2600, 2600-2700, 2700-2800, 2800-2900, 2900-3000, 3000-3100, 
3100-3200, 3200-3300, 3300-3400, 3400-3500, 3500-3600, 3600-3700, 3700-3800, 
3800-3900, 3900-4000, 4000-4100, 4100-4200, 4200-4300, 4300-4321 of SEQ ID 

20 NO:4. 

A nucleic acid molecule that is a fragment of a 25278-like nucleotide sequence 
of the present invention comprises a nucleotide sequence consisting of nucleotides 1- 
100, 100-200, 200-300, 300-400, 400-500, 500-600, 600-700, 700-900, 900-1000, 
1000-1100, 1100-1200, 1200-1300, 1300-1400, 1400-1500, 1500-1600, 1600-1700, 
25 1700-1800, 1800-1900, 1900-2000, 2000-2100, 2100-2200, 2200-2300, 2300-2400, 
2400-2500, 2500-2600, 2600-2700, 2700-2800, 2800-2900, 2900-2940 of SEQ ID 
NO:6. 

A nucleic acid molecule that is a fragment of a 26212-like nucleotide sequence of the 
present invention comprises a nucleotide sequence consisting of nucleotides 1-100, 
30 1 00-200, 200-300, 300-400, 400-500, 500-600, 600-700, 700-900, 900- 1 000, 1 000- 
1100, 1100-1200, 1200-1300, 1300-1400, 1400-1500, 1500-1600, 1600-1700, 1700- 
1800, 1800-1900, 1900-2000, 2000-2100, 2100-2200, 2200-2253 of SEQ ID NO:8. 
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The invention also provides sulfatase nucleic acid molecules encoding the 
variant polypeptides described herein. Such polynucleotides may be naturally occurring, 
such as allelic variants (same locus), homologs (different locus), and orfhologs (different 
organism), or may be constracted by recombinant DNA methods or by chemical 
5 synthesis. Such non-naturally occurring variants may be made by mutagenesis 

techniques, including those apphed to polynucleotides, cells, or organisms. Accordingly, 
as discussed above, the variants can contain nucleotide substitutions, deletions, 
inversions and insertions. 

Typically, variants have a substantial identity with a nucleic acid molecules of 
10 SEQ ID NOS:2, 4, 6, 8, 11, 12, 13, or 14, and the complements thereof. Variation can 
occur in either or both the coding and non-coding regions. The variations can produce 
both conservative and non-conservative amino acid substitutions. 

Orthologs, homologs, and allehc variants can be identified using methods well 
known in the art. These variants comprise a nucleotide sequence encoding a sulfatase 
1 5 that is typically at least about 40-45%, 45-50%, 50-55%, 55-60%, 60-65%, 65-70%, 70- 
75%, more typically at least about 75-80% or 80-85%, and most typically at least about 
85-90% or 90-95% or more homologous to the nucleotide sequence shown in SEQ ID 
NOS:2, 4, 6 or 8, or a fragment of this sequence. Such nucleic acid molecules can 
readily be identified as being able to hybridize xmder stringent conditions, to the 
20 nucleotide sequence shown in SEQ ID NOS:2, 4, 6, 8, 1 1, 12, 13, or 14, or a fragment of 
the sequence. 

In the case of the 23553 sulfatase, in one embodiment, a variant is greater than 
65% homologous with respect to nucleotide sequence. For the 25278 sulfatase, in one 
embodiment, a variant is greater than 50-60% homologous with respect to nucleotide 

25 sequence. With respect to tiie 262 1 2 sulfatase, in one embodiment, a variant is greater 
than about 65-75% homologous with respect to nucleotide sequence. 

It is understood that stringent hybridization does not indicate substantial 
homology where it is due to general homology, such as polyA^ sequences, or sequences 
common to all or most protems, sulfatases, arylsulfatases, glucosamine-6-sulfatases, N- 

30 acetylgalactosamine-4-sulfatases, or any of the sulfatases to which the sulfatases of the 
present invention have shown homology by BLAST analysis, for example, regions to 
arylsulfatases A, B, C, D, E, F, IDS, and the Uke. Moreover, it is understood that 
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variants do not include any of the nucleic acid sequences that may have been disclosed 
prior to the invention. 

As used herein, the tenti "hybridizes under stringent conditions" describes 
conditions for hybridization and washing. Stringent conditions are known to those 
5 skilled in the art and can be found in Current Protocols in Molecular Biology John 
Wiley & Sons, N.Y. (1989), 6.3.1-6.3.6. Aqueous and nonaqueous methods are 
described in that reference and either can be used. A preferred, example of stringent 
hybridization conditions are hybridization in 6X sodium chloride/sodium citrate 
(SSC) at about 45^C, followed by one or more washes in 0.2X SSC, 0.1% SDS at 
10 50°C. Another example of stringent hybridization conditions are hybridization in 6X 
sodium chloride/sodium citrate (SSC) at about 45°C, followed by one or more washes 
in 0.2X SSC, 0.1% SDS at SS'^C. A further example of stringent hybridization 
conditions are hybridization in 6X sodium chloride/sodium citrate (SSC) at about 
45°C, followed by one or more washes in 0.2X SSC, 0.1% SDS at 60^C. Preferably, 
15 stringent hybridization conditions are hybridization in 6X sodium chloride/sodium 

citrate (SSC) at about 45°C, followed by one or more washes in 0.2X SSC, 0.1% SDS 
at 65°C. Particularly preferred stringency conditions (and the conditions that should 
be used if the practitioner is uncertain about what conditions should be applied to 
determine if a molecule is within a hybridization limitation of the invention) are 0.5M 
20 Sodium Phosphate, 7% SDS at eS'^C, followed by one or more washes at 0.2X SSC, 
1% SDS at 65''C. Preferably, an isolated nucleic acid molecule of the invention that 
hybridizes under stringent conditions to the sequence of SEQ ID NOS:2, 4, 6, 8, 1 1, 
12, 13, or 14 corresponds to a naturally-occurring nucleic acid molecule. As used 
herein, a "naturally-occurring" nucleic acid molecule refers to an RNA or DNA 
25 molecule having a nucleotide sequence that occurs in nature (e.g., encodes a natural 
protein). 

The present invention also provides isolated nucleic acids that contain a single 
or double stranded fragment or portion that hybridizes under stringent conditions to 
the nucleotide sequence of SEQ IDN0S:2, 4, 6, 8, 11, 12, 13, or 14, or the 
30 complements of SEQ IDNOS:2, 4, 6, 8, 11, 12, 13, or 14. In one embodiment, the 
nucleic acid consists of a portion of a nucleotide sequence of SEQ ID NOS:2, 4, 6, 8, 
1 1, 12, 13, or 14 and the complements. The nucleic acid fragments of the invention 
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are at least about 10-15, preferably at least about 15-20 or 20-25 contiguous 
nucleotides, and can be 30, 33, 35, 40, 50, 60, 70, 75, 80, 90, 100, 200, 500 or more 
nucleotides in length. Longer fragments, for example, 600 or more nucleotides in 
length, which encode antigenic proteins or polypeptides described herein are also 

5 useful. 

In the case of the 23553 sulfatase, in one embodiment, fragments are derived 
from nucleotide 1 to about nucleotide 670 and comprise 5-10 and 10-20 contiguous 
base pairs, and particularly greater than 18. For this sulfatase, in another 
embodiment, a fragment is derived from around nucleotide 3008 to 3514 and 
10 comprises around 5-10 and 10-20 contiguous nucleotides. In other embodiments for 
this sulfatase, a fragment is derived from around nucleotide 3994 to 4321 and is about 
5-10 or 10-20 contiguous nucleotides. For the 25278, in one embodiment, a fragment 
is derived from aroimd nucleotide 130 to around nucleotide 454 and comprises a 
contiguous sequence of about 5-10 or 10-20 nucleotides. In another embodiment, the 
1 5 fragment is derived from around nucleotide 454 to around nucleotide 1 400 and 

comprises around 5-10 or 10-20 contiguous nucleotides, especially a fragment greater 
than 17 nucleotides. In another embodiment the fragment is derived from aroimd 
nucleotide 1400 to around nucleotide 1850 and comprises a continuous sequence of 
around 5-10, 10-20, or 20-25 nucleotides, especially a fragment greater than 23 
20 nucleotides. In another embodiment, a fragment is derived from about nucleotide 
1933 to about nucleotide 2421. Such a fragment comprises around 5-10 or 10-20 
contiguous nucleotides. For the 26212 sulfatase, in one embodiment, a fragment is 
derived from aroimd nucleotide 272 to around nucleotide 538 and comprises a 
contiguous sequence of aroxmd 5-10 or 10-20 nucleotides, especially a fragment 
25 greater than 17 nucleotides. In another embodiment, the fragment is derived from 

around nucleotide 538 to around nucleotide 751 and comprises a contiguous sequence 
of at least 5-10 or 10-20 nucleotides, especially greater than 12 nucleotides. In 
another embodiment, the fragment is derived from around nucleotide 1074 to aroxmd 
1551 and comprises a contiguous nucleotide sequence of around 5-10, 10-20, or 20- 
30 30, especially greater than 20 nucleotides. In a ftirther embodiment, the fragment is 

derived from around nucleotide 2052 to 2251 and comprises a contiguous sequence of 
5-10 and 10-20 nucleotides, especially fragments greater than 18 nucleotides. 
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The fragment can comprise DNA or RNA and can be derived from either the 
coding or the non-coding sequence. 

In another embodiment an isolated sulfatase nucleic acid encodes the entire 
coding region. In another embodunent the isolated sulfatase nucleic acid encodes a 
5 sequence corresponding to the mature protein. Other fragments include nucleotide 
sequences encoding the amino acid fragments described herein. 

Thus, sulfatase nucleic acid fragments finther include sequences corresponding 
to the regions described herein, subregions also described, and specific fimctional sites. 
Sulfatase nucleic acid fragments also include combinations of the regions, segments, 
1 0 motifs, and other fimctional sites described above. It is understood that a sulfatase 
fragment includes any nucleic acid sequence that does not include the entire gene. A 
person of ordinary skill in the art would be aware of the many permutations that are 
possible. Nucleic acid fragments, according to the present invention, are not to be 
construed as encompassing those fragments that may have been disclosed prior to the 
1 5 invention. 

Where the location of the regions or sites have been predicted by computer 
analysis, one of ordinary skill would appreciate that the amino acid residues constituting 
these regions can vary depending on the criteria used to define the regions. 



20 Polynucleotide Uses 

The nucleotide sequences of the present invention can be used as a "query 
sequence" to perform a search against public databases, for example, to identify other 
family members or related sequences. For more information about public databases, 
see page 26, above. 

25 The nucleic acid fragments of the invention provide probes or primers in 

assays such as those described below. "Probes" are oligonucleotides that hybridize in 
a base-specific manner to a complementary strand of nucleic acid. Such probes 
include polypeptide nucleic acids, as described in Nielsen et a/, (1991) Science 
254:1497-1500. Typically, a probe comprises a region of nucleotide sequence that 

30 hybridizes under highly stringent conditions to at least about 15, typically about 20- 
25, and more typically about 30, 40, 50 or 75 consecutive nucleotides of the nucleic 
acid sequence shown in SEQ ID NOS:2, 4, 6, 8, 11, 12, 13, or 14, and the 
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complements thereof. More typically, the probe further comprises a label, e.g., 
radioisotope, fluorescent compound, enzyme, or enzyme co-factor. 

As used herein, the term "primer" refers to a single-stianded oUgonucleotide 
which acts as a point of initiation of template-directed DNA synthesis using well- 
5 known methods (e.g., PGR, LCR) including, but not limited to those described herein. 
The appropriate length of the primer depends on the particular use, but typically 
ranges from about 15 to 30 nucleotides. The term "primer site" refers to the area of 
the target DNA to which a primer hybridizes. The term "primer pair" refers to a set of 
pruners including a 5* (upstream) primer that hybridizes witli the 5* end of the nucleic 
1 0 acid sequence to be amplified and a 3* (downstream) primer that hybridizes with the 
complement of the sequence to be amplified. 

Sulfatase polynucleotides are tlius useful for probes, primers, and in biological 
assays. Where the polynucleotides are used to assess sulfatase properties or functions, 
such as in the assays described herein, all or less tiian all of the entire cDNA can be 
1 5 useful. Assays specifically directed to sulfatase functions, such as assessing agonist or 
antagonist activity, encompass the use of known fragments. Fvirther, diagnostic methods 
for assessing sulfatase fimction can also be practiced with any fr agment, including those 
fragments that may have been known prior to the invention. Similarly, in methods 
involving treatment of sulfatase dysfunction, all fragments are encompassed including 
20 those, which may have been known in the art. 

Sulfatase polynucleotides are useful as a hybridization probe for cDNA and 
genomic DNA to isolate a full-length cDNA and genomic clones encoding the 
polypeptides described in SEQ ID NOS:l, 3, 5, or 7, and to isolate cDNA and genomic 
clones that correspond to variants producing the same polypeptides shown in SEQ ID 
25 NOS: 1, 3, 5, or 7, or the other variants described herein. Variants can be isolated from 
the same tissue and organism from which a polypeptide shown in SEQ ID NOS:l, 3, 5, 
or 7 was isolated, different tissues from the same organism, or from different organisms. 
This method is useful for isolating genes and cDNA that are developmentally-controUed 
and therefore may be expressed in the same tissue or different tissues at different points 
30 in the development of an organism. 
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The probe can correspond to any sequence along the entke length of tlie gene 
encoding the sulfatase polypeptide. Accordingly, it could be derived from 5' noncoding 
regions, the coding region, and 3' noncoding regions. 

The nucleic acid probe can be, for example, the full-length cDNA of SEQ ID 
5 NOS:2, 4, 6, 8, 1 1, 12, 13, or 14 or a fragment thereof, such as an oligonucleotide of at 
least 5, 10, 15, 20, 25, 30, 50, 100, 250 or 500 nucleotides in length and sufficient to 
specifically hybridize under stringent conditions to mRNA or DN A. 

Fragments of the polynucleotides described herein are also useful to syntliesize 
larger fragments or full-length polynucleotides described herein, ribozymes or antisense 
1 0 molecules. For example, a fragment can be hybridized to any portion of an mRNA and a 
larger or full-length cDNA can be produced. 

Antisense nucleic acids of the invention can be designed using the nucleotide 
sequences of SEQ ID NOS:2, 4, 6, 8, 11, 12, 13, or 14 and constructed using chemical 
synthesis and enzymatic ligation reactions using procedures known in the art. For 
15 example, an antisense nucleic acid (e.g., an antisense oligonucleotide) can be 

chemically synthesized using naturally occurring nucleotides or variously modified 
nucleotides designed to increase the biological stability of the molecules or to increase 
the physical stability of the duplex formed between the antisense and sense nucleic 
acids, e.g., phosphorothioate derivatives and acridine substituted nucleotides can be 
20 used. Examples of modified nucleotides which can be used to generate the antisense 
nucleic acid include 5-fluorouracil, 5-bromouracil, 5-chlorouracil, 5-iodouracil, 
hypoxanthine, xanthine, 4-acetylcytosine, 5-(carboxyhydroxyknethyl) uracil, 5- 
carboxymethylaminomethyl-2-thiouridine, 5-carboxymethylaminomethyluracil, 
dihydrouracil, beta-D-galactosylqueosine, inosine, N6-isopentenyladenine, 1- 
25 methylguanine, l-methylinosine, 2,2-dimethylguanine, 2-methyladenine, 2- 

methylguanine, 3-methylc)^osine, 5-methylcytosine, N6-adenine, 7-methylguanine, 5- 
methylaminomethyluracil, 5-methoxyaminomethyl-2-thiouracil, beta-D- 
mannosylqueosine, 5*-methoxycarboxymethyluracil, 5-methoxym*acil, 2-methylthio- 
N6-isopentenyladenine, uracil-5-oxyacetic acid (v), wybutoxosine, pseudouracil, 
30 queosine, 2-thiocytosine, 5-methyl-2-thiouracil, 2-thiouracil, 4-thiouracil, 5- 

methyluracil, uracil-5-oxyacetic acid methylester, uracil-5-oxyacetic acid (v), 5- 
methyl-2-thiouracil, 3-(3-amino-3-N-2-carboxypropyl) uracil, (acp3)w, and 2,6- 
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diaminopurine. Alternatively, the antisense nucleic acid can be produced biologically 
using an expression vector into which a nucleic acid has been subcloned in an 
antisense orientation (i.e., RNA transcribed from the inserted nucleic acid will be of 
an antisense orientation to a target nucleic acid of interest). 
5 Additionally, the nucleic acid molecules of the invention can be modified at 

the base moiety, sugar moiety or phosphate backbone to improve, e.g., the stability, 
hybridization, or solubility of the molecule. For example, the deoxyribose phosphate 
backbone of the nucleic acids can be modified to generate peptide nucleic acids (see 
Hyrup et al (1996) Bioorganic & Medicinal Chemistry 4:5). As used herein, the 
1 0 terms "peptide nucleic acids" or "PNAs" refer to nucleic acid mimics, e.g., DN A 

mimics, in which tiie deoxyribose phosphate backbone is replaced by a pseudopeptide 
backbone and only the four natural nucleobases are retained. The neutral backbone of 
PNAs has been shown to allow for specific hybridization to DNA and RNA under 
conditions of low ionic strength. The synthesis of PNA oligomers can be performed 
1 5 using standard solid phase peptide synthesis protocols as described in Hyrup et al. 
(1996), supra; Peiry-CKeefe etal. (1996) Proc. Natl. Acad. Sci. USA 93:14670. 
PNAs can be further modified, e.g., to enhance their stability, specificity or cellular 
uptake, by attaching lipophilic or other helper groups to PNA, by the fomiation of 
PNA-DNA chimeras, or by the use of liposomes or other techniques of drug delivery 
20 known in the art. The synthesis of PNA-DNA chimeras can be perfomied as 

described in Hyrup (1996), jupm, Finn et al. (1996) Nucleic Acids Res. 2^(17):3357- 
63, Mag et al. (1989) Nucleic Acids Res. 1 7:5973, and Peterser et al. (1975) 
Bioorganic Med. Chem. Lett. 5:1119. 

The nucleic acid molecules and fragments of the invention can also include 
25 other appended groups such as peptides (e.g., for targeting host cell sulfatases in 

vivo), or agents facilitating transport across the cell membrane (see, e.g., Letsinger et 
al. (1989) Proc. Natl. Acad. Sci. USA 55:6553-6556; Lemaitre etal. (1987) Proc. 
Natl. Acad Sci. USA S¥:648-652; PCT Publication No. WO 88/0918) or the blood 
brain barrier (see, e.g., PCT Publication No. WO 89/10134). In addition, 
30 oligonucleotides can be modified with hybridization-triggered cleavage agents (see, 
e.g., Krol et al. (1988) Bio-Techniques 5:958-976) or intercalating agents (see, e.g., 
Zon (1988) Pharm Res. 5:539-549). 
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Sulfatase polynucleotides are also useful as primers for PGR to amplify any 
given region of a sulfatase polynucleotide. 

Sulfatase polynucleotides are also useful for constructing recombinant vectors. 
Such vectors include expression vectors that express a portion of, or all of, the sulfatase 
5 polypeptides. Vectors also include insertion vectors, used to integrate into anotlier 

polynucleotide sequence, such as into the cellular genome, to alter in situ expression of 
sulfatase genes and gene products. For example, an endogenous sulfatase coding 
sequence can be replaced via homologous recombination with all or part of the coding 
region containing one or more specifically introduced mutations. 
10 Sulfatase polynucleotides are also useful for expressing antigenic portions of 

sulfatase proteins. 

Sulfatase polynucleotides are also useful as probes for deteraiining the 
chromosomal positions of sulfatase polynucleotides by means of m situ hybridization 
methods, such as FISH. (For a review of this teclinique, see Verma et a/. (1988) Human 
1 5 Chromosomes: A Manual of Basic Techniques (Pergamon Press, New York), and PGR 
mapping of somatic cell hybrids. The mapping of the sequences to chromosomes is an 
important first step in correlating these sequences with genes associated with disease. 

Reagents for chromosome mapping can be used individually to mark a single 
chromosome or a single site on that cliromosome, or panels of reagents can be used for 
20 marking multiple sites and/or multiple chromosomes. Reagents corresponding to 

noncoding regions of the genes actually are preferred for mapping purposes. Goding 
sequences are more likely to be conserved within gene families, thus increasing the 
chance of cross hybridizations during chromosomal mapping. 

Once a sequence has been mapped to a precise chromosomal location, the 
25 physical position of the sequence on the chromosome can be correlated with genetic map 
data. (Such data are found, for example, in V. McKusick, Mendelian Inheritance in 
Man, available on-line through Johns Hopkins University Welch Medical Library). The 
relationsliip between a gene and a disease mapped to the same chromosomal region, can 
then be identified through linkage analysis (co-inheritance of physically adjacent genes), 
30 described in, for example, Egeland et al ((1 987) Natw^e 325:1^3-7 87). 

Moreover, differences in the DNA sequences between individuals ajffected and 
unaffected with a disease associated with a specified gene, can be determined. If a 
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mutation is observed in some or all of the affected individuals but not in any unaffected 
individuals, then the mutation is likely to be the causative agent of the particular disease. 
Comparison of affected and unaffected individuals generally involves first looking for 
structural alterations in the chromosomes, such as deletions or translocations, that are 

5 visible from chromosome spreads, or detectable using PGR based on that DNA 

sequence. Ultimately, complete sequencing of genes from several individuals can be 
performed to confirm the presence of a mutation and to distinguish mutations from 
polymorphisms. 

Sulfatase polynucleotide probes are also usefiil to determine patterns of the 
1 0 presence of the gene encoding sulfatases and their variants witli respect to tissue 
distribution, for example, whether gene dupUcation has occurred and whether the 
duplication occurs in all or only a subset of tissues. The genes can be naturally occurring 
or can have been introduced into a cell, tissue, or organism exogenously. 

Sulfatase polynucleotides are also usefiil for designing ribozymes corresponding 
15 to all, or a part, of the mRNA produced from genes encoding the polynucleotides 
described herein. 

Sulfatase polynucleotides are also useful for constmcting host cells expressing a 
part, or all, of a sulfatase polynucleotide or polypeptide, 

Sulfatase polynucleotides are also usefiil for constructing transgenic animals 
20 expressing all, or a part, of a sulfatase polynucleotide or polypeptide. 

Sulfatase polynucleotides are also usefiil for making vectors that express part, or 
all, of a sulfatase polypeptide. 

Sulfatase polynucleotides are also usefiil as hybridization probes for determining 
the level of sulfatase nucleic acid expression. Accordingly, the probes can be used to 
25 detect the presence of, or to determine levels of, sulfatase nucleic acid in cells, tissues, 
and in organisms. The nucleic acid whose level is determined can be DNA or RNA. 
Accordingly, probes corresponding to the polypeptides described herein can be used to 
assess gene copy number in a given cell, tissue, or organism. This is particularly 
relevant in cases in which there has been an amplification of a sulfatase gene. 
30 Alternatively, the probe can be used in an in situ hybridization context to assess 

the position of extra copies of a sulfatase gene, as on extrachromosomal elements or as 
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integrated into chromosomes in which tlie sulfatase gene is not normally found, for 
example, as a homogeneously staining region. 

These uses are relevant for diagnosis of disorders involving an increase or 
decrease in sulfatase expression relative to normal, such as a proliferative disorder, a 
5 differentiative or developmental disorder, or a hematopoietic disorder. Disorders in 
which sulfatase expression is relevant include, but are not limited to, those disclosed 
herem above. 

Disorders in which 22438 sulfatase expression is relevant include, but are not 
limited to, those involving the tissues as disclosed herein and those associated with 
10 pain. 

Disorders in which 23553 sulfatase expression is relevant include, but are not 
limited to, breast and colon carcinoma. 

Disorders in which 25278 sulfatase expression is relevant include, but are not 
limited to, colon carcinoma. 
15 Disorders in which 26212 sulfatase expression is relevant include, but are not 

limited to, hemangioma and uterine adenocarcinoma. 

Thus, the present invention provides a method for identifying a disease or 
disorder associated with aberrant expression or activity of a sulfatase nucleic acid, in 
which a test sample is obtained firom a subject and nucleic acid (e.g., mRNA, genomic 
20 DNA) is detected, wherein the presence of the nucleic acid is diagnostic for a subject 

having or at risk of developing a disease or disorder associated with aberrant expression 
or activity of the nucleic acid. 

One aspect of the invention relates to diagnostic assays for determining 
nucleic acid expression as well as activity in the context of a biological sample (e.g., 
25 blood, serum, cells, tissue) to determine whether an individual has a disease or 

disorder, or is at risk of developing a disease or disorder, associated with aberrant 
nucleic acid expression or activity. Such assays can be used for prognostic or 
predictive purpose to thereby prophylactically treat an individual prior to the onset of 
a disorder characterized by or associated with expression or activity of the nucleic 
30 acid molecules. 
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In vitf'o techniques for detection of mRNA include Northern hybridizations and 
in situ hybridizations. In viti'O techniques for detecting DNA includes Southern 
hybridizations and in situ hybridization. 

Probes can be used as a part of a diagnostic test kit for identifying cells or tissues 
5 that express a sulfatase, such as by measuring the level of a sulfatase-encoding nucleic 
acid in a sample of cells from a subject e.g., mRNA or genomic DNA, or determining if 
the sulfatase gene has been mutated. 

Nucleic acid expression assays are useful for drug screening to identify 
compounds that modulate sulfatase nucleic acid expression (e.g., antisense, 
10 polypeptides, peptidomimetics, small molecules or other drugs). A cell is contacted 
with a candidate compound and the e:?q)ression of mRNA deteraiined. The level of 
expression of the mRNA in the presence of the candidate compound is compared to the 
level of expression of the mRNA in the absence of the candidate compound. The 
candidate compound can then be identified as a modulator of nucleic acid expression 
1 5 based on this comparison and be used, for example to treat a disorder characterized by 
aberrant nucleic acid expression. The modulator can bind to the nucleic acid or 
indirectly modulate expression, such as by interacting with other cellular components 
that affect nucleic acid expression. 

Modulatory methods can be performed in vitj'o (e.g., by cultming the cell with 
20 the agent) or, alternatively, in vivo (e.g., by administering the gent to a subject) in 

patients or in transgenic aiaimals. The invention thus provides a method for identifying a 
compound that can be used to treat a disorder associated with nucleic acid expression of 
a sulfatase gene. The method typically includes assaying the ability of the compound to 
modulate the expression of the sulfatase nucleic acid and th\is identifying a compound 
25 that can be used to treat a disorder characterized by undesired sulfatase nucleic acid 
expression. 

The assays can be performed in cell-based and cell-free systems. Cell-based 
assays include cells naturally expressing the sulfatase nucleic acid or recombinant cells 
genetically engineered to express specific nucleic acid sequences. Alternatively, 
30 candidate compounds can be assayed in vivo in patients or in transgenic animals. 

The assay for sulfatase nucleic acid expression can involve direct assay of 
nucleic acid levels, such as mRNA levels, or on collateral compounds (such as substrate 
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hydrolysis). Further, the expression of genes tliat are up- or down-regulated in response 
to sulfatase activity can also be assayed. In this embodiment the regulatory regions of 
these genes can be operably linked to a reporter gene such as luciferase. 

Thus, modulators of sulfatase gene expression can be identified in a method 
5 wherein a cell is contacted with a candidate compound and the expression of mRNA 

detemiined. The level of expression of sulfatase mRNA in the presence of the candidate 
compound is compared to the level of expression of sulfatase mRNA in the absence of 
the candidate compound. The candidate compoimd can then be identified as a modulator 
of nucleic acid expression based on this comparison and be used, for example to treat a 
1 0 disorder characterized by aberrant nucleic acid expression. When expression of mRNA 
is statistically significantly greater in the presence of the candidate compound than in its 
absence, the candidate compovind is identified as a stimulator of nucleic acid expression. 
When nucleic acid expression is statistically significantly less in the presence of the 
candidate compound than in its absence, the candidate compound is identified as an 
1 5 inhibitor of nucleic acid expression. 

Accordingly, the invention provides methods of treatment, with the nucleic acid 
as a target, using a compound identified through drug screening as a gene modulator to 
modulate sulfatase nucleic acid expression. Modulation includes both up-regulation (i.e. 
activation or agonization) or down-regulation (suppression or antagonization) or effects 
20 on nucleic acid activity (e.g. when nucleic acid is mutated or improperly modified). 

Treatment is of disorders characterized by aberrant expression or activity of the nucleic 
acid. 

Alternatively, a modulator for sulfatase nucleic acid expression can be a small 
molecule or drug identified using the screening assays described herein as long as the 

25 drug or small molecule inhibits sulfatase nucleic acid expression. 

Sulfatase polynucleotides are also useful for monitoring the effectiveness of 
modulating compounds on the expression or activity of a sulfatase gene in clinical trials 
or in a treatment regimen. Thus, the gene expression pattern can serve as a barometer 
for the continuing effectiveness of ti*eatment with the compound, particularly with 

30 compounds to which a patient can develop resistance. The gene expression pattern can 
also serve as a marker indicative of a physiological response of the affected cells to the 
compound. Accordingly, such monitoring would allow either increased administration 
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of the compound or the administration of alternative compoxmds to which the patient has 
not become resistant. Sindlarly, if the level of nucleic acid expression falls below a 
desirable level, administration of the compound could be commensurately decreased. 

Monitoring can be, for example, as follows: (i) obtaining a pre-administration 
5 sample from a subject prior to administration of the agent; (ii) detecting the level of 
expression of a specified mRNA or genomic DNA of the invention in the pre- 
administration sample; (iii) obtaining one or more post-administration samples from 
the subject; (iv) detecting the level of expression or activity of the mRNA or genomic 
DNA in the post-administration samples; (v) comparing the level of expression or 
1 0 activity of the mRNA or genomic DNA in the pre-administration sample with the 
mRNA or genomic DNA in the post-administration sample or samples; and (vi) 
increasing or decreasing the administration of the agent to the subject accordingly. 

Sulfatase polynucleotides are also useful in diagnostic assays for qualitative 
changes in sulfatase nucleic acid, and particularly in qualitative changes that lead to 
1 5 pathology. The polynucleotides can be used to detect mutations in sulfatase genes and 
gene expression products such as mRNA. The polynucleotides can be used as 
hybridization probes to detect naturally-occurring genetic mutations in a sulfatase gene 
and thereby to determine whether a subject with the mutation is at risk for a disorder 
caused by the mutation. Mutations include deletion, addition, or substitution of one or 
20 more nucleotides in the gene, chromosomal rearrang^ent, such as inversion or 

transposition, modification of genomic DNA, such as aberrant methylation patterns or 
changes in gene copy number, such as amplification. Detection of a mutated form of a 
sulfatase gene associated with a dysfunction provides a diagnostic tool for an active 
disease or susceptibility to disease when the disease results from overexpression, 
25 undeiexpression, or altered expression of a sulfatase. 

Mutations in a sulfatase gene can be detected at the nucleic acid level by a 
variety of techniques. Genomic DNA can be analyzed directly or can be amplified by 
using PGR prior to analysis. RNA or cDNA can be used in the same way. 

In certain embodiments, detection of the mutation involves the use of a 
30 probe/primer in a polymerase chain reaction (PGR) (see, e.g. U.S. Patent Nos. 4,683,195 
and 4,683,202), such as anchor PGR or RAGE PGR, or, alternatively, in a ligation chain 
reaction (LGR) (see, e.g., Landegcmetal. (1988) Sciejtce 2^7:1077-1080; and 
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Nalcazawa et al. (1994) PNAS Pi:360-364), the latter of which can be particularly useM 
for detecting point mutations in the gene (see Abravaya et al. (1995) Nucleic Acids Res. 
25:675-682). This method can include the steps of collecting a sample of cells from a 
patient, isolating nucleic acid (e.g., genomic, mRNA or both) from the cells of the 
5 sample, contacting the nucleic acid sample with one or more primers which specifically 
hybridize to a gene vmder conditions such that hybridization and amplification of the 
gene (if present) occui*s, and detecting the presence or absence of an amplification 
product, or detecting the size of the amplification product and comparing the length to a 
control sample. Deletions and insertions can be detected by a change in size of the 
1 0 amplified product compared to the normal genotype. Point mutations can be identified 
by hybridizing amplified DNA to normal RNA or antisense DNA sequences. 

It is anticipated that PGR and/or LCR may be desirable to use as a preliminary 
amplification step in conjunction with any of the techniques used for detecting 
mutations described herein. 
15 Alternative amplification methods include: self sustained sequence replication 

(Guatelli et al (1990) Proc, Natl Acad Set. USA 57:1874-1878), transcriptional 
amplification system (Kwoh et ah (1989) Proc. Natl Acad, Sci USA S(5:l 173-1 1-77), 
Q-Beta Replicase (Lizardi et aL (1988) Bio/Technology 6\\\91\ or any other nucleic 
acid amplification method, followed by the detection of the amplified molecules using 
20 techniques well-known to those of skill in the art. These detection schemes are 
especially useful for the detection of nucleic acid molecules if such molecules are 
present in very low numbers. 

Alternatively, mutations in a sulfatase gene can be directly identified, for 
example, by alterations in restriction enzyme digestion patterns determined by gel 
25 electrophoresis. 

Further, sequence-specific ribozymes (U.S. Patent No. 5,498,531) can be used to 
score for the presence of specific mutations by development or loss of a ribozyme 
cleavage site. 

Perfectly matched sequences can be distiaguished from mismatched sequences 
30 by nuclease cleavage digestion assays or by differences in melting temperature. 

Sequence changes at specific locations can also be assessed by nuclease 
protection assays such as RNase and SI prptection or the chemical cleavage method. 
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Furthermore, sequence differences between a mutant sulfatase gene and a A\'ild- 
type gene can be detemiined by direct DNA sequencing. A variety of automated 
sequencing proceduies can be utilized when performing the diagnostic assays ((1995) 
Biotechniques 79:448), includmg sequencing by mass spectrometry (see, e.g., PCT 
5 International Publication No. WO 94/16101; Cohen et al {\996) Adv. Chromatog}\ 
55:127-162; and Griffin (1993) ^jcp/. Biocheim Biotechnol 35:147-159). 

Other methods for detecting mutations in the gene include methods in which 
protection from cleavage agents is used to detect mismatched bases in RNA/RNA or 
RNA/DNA duplexes (Myers et ah (1985) Science 250:1242); Cotton et al (1988) PNAS 
10 55:4397; Saleeba et al (1992) Meth EjizymoL 277:286-295), electrophoretic mobility of 
mutant and wild type nucleic acid is compared (Orita et al (1989) PNAS 86:2766; 
Cotton et al (1993) Mutat, Res. 255:125-144; and Hayashi et al (1992) Genet Anal 
Tech Appl 9:73-79), and movement of mutant or wild-type fragments in 
polyacrylamide gels containing a gradient of denaturant is assayed using denaturing 
1 5 gradient gel electrophoresis (Myers et al (1 985) Nature 575:495). The sensitivity of the 
assay may be enhanced by using RNA (rather than DNA), in which the secondary 
structure is more sensitive to a change in sequence. In one embodiment, the subject 
method utilizes heteroduplex analysis to separate double stranded heteroduplex 
molecules on the basis of changes in electrophoretic mobility (Keen et a/. (1991) 
20 Trends Genet. 7:5). Examples of other techniques for detecting point mutations include, 
selective oligonucleotide hybridization, selective amplification, and selective primer 
extension. 

In other embodiments, genetic mutations can be identified by hybridizing a 
sample and control nucleic acids, e.g., DNA or RNA, to high density arrays 

25 containing himdreds or thousands of oligonucleotide probes (Cronin et aL (1996) 
Human Mutation 7:244-255; Kozal et al (1996) Nature Medicine lilSZ-lSB). For 
example, genetic mutations can be identified in two dimensional arrays containing 
light-generated DNA probes as described in Cronin et al supra. Briefly, a first 
hybridization array of probes can be used to scan through long stretches of DNA in a 

30 sample and control to identify base changes between the sequences by making linear 
arrays of sequential overlapping probes. This step allows the identification of point 
mutations. This step is followed by a second hybridization array that allows the 
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characterization of specific mutations by using smaller, specialized probe arrays 
complementary to all variants or mutations detected. Each mutation array is 
composed of parallel probe sets, one complementary to the wild-type gene and the 
other complementary to the mutant gene. 
5 Sulfatase polynucleotides aie also useful for testing an individual for a genotype 

that while not necessarily causing the disease, nevertheless affects the treatment 
modality. Thus, the polynucleotides can be used to study the relationship between an 
individual's genotype and the individual's response to a compound used for treatment 
(pharmacogenomic relationship). In the present case, for example, a mutation in tlie 
1 0 sulfatase gene that results in altered affinity for a substrate-related compound could 
result in an excessive or decreased drug effect with standard concentrations of the 
compound. Accordingly, the sulfatase polynucleotides described herein can be used to 
assess the mutation content of the gene in an individual in order to select an ^propriate 
compound or dosage regimen for treatment. 
1 5 Thus polynucleotides displaying genetic variations that affect treatment provide a 

diagnostic target that can be used to tailor treatment in an individual. Accordingly, the 
production of recombinant cells and animals containing these polymorphisms allow 
effective clinical design of treatment compounds and dosage regimens. 

The methods can involve obtaining a control biological sample from a control 
20 subject, contacting the control sample with a compound or agent capable of detecting 
mRNA, or genomic DNA, such that the presence of mRNA or genomic DNA is 
detected in the biological sample, and comparing the presence of mRNA or genomic 
DNA in the control sample with the presence of mRNA or genomic DNA in the test 
sample. 

25 Sulfatase polynucleotides are also useful for chromosome identification when the 

sequence is identified with an individual chromosome and to a particular location on the 
chromosome. First, the DNA sequence is matched to the chromosome by in situ or 
other chromosome-specific hybridization. Sequences can also be correlated to specific 
chromosomes by preparing PGR primers that can be used for PGR screening of somatic 

30 cell hybrids containing individual chromosomes firom the desired species. Only hybrids 
containing the chromosome containing the gene homologous to the primer will yield an 
amplified fi^gment. SublocaUzation can be achieved using chromosomal firagments. 
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Other strategies include prescreening with labeled flow-sorted chromosomes and 
preselection by hybridization to chromosome-specific libraries. Further mapping 
strategies include fluorescence in situ hybridization, which allows hybridization with 
probes shorter than those traditionally used. Reagents for chromosome mapping can be 
5 used individually to mark a single chromosome or a single site on the chromosome, or 
panels of reagents can be used for marking multiple sites and/or multiple chromosomes. 
Reagents corresponding to noncoding regions of the genes actually are preferred for 
mapping piuposes. Coding sequences are more likely to be conserved within gene 
families, thus increasing the chance of cross hybridizations during chromosomal 
10 mapping, 

Sulfatase polynucleotides can also be used to identify individuals from small 
biological samples. This can be done for example using restriction j&agment-length 
polymorphism (RFLP) to identify an individual. Thus, the polynucleotides described 
herein are useful as DNA markers for RFLP (See U.S. Patent No. 5,272,057). 

1 5 Furthermore, the sulfatase sequences can be used to provide an alternative 

technique, which determines the actual DNA sequence of selected fragments in the 
genome of an individual. Thus, the sulfatase sequences described herein can be used to. 
prepare two PGR primers from the 5' and 3' ends of the sequences. These primers can 
then be used to ampUfy DNA from an individual for subsequent sequencmg. 

20 Panels of corresponding DNA sequences from individuals prepared in this 

maimer can provide imique individual identifications, as each individual will have a 
imique set of such DNA sequences. It is estimated that allelic variation in humans 
occurs with a frequency of about once per each 500 bases. Allelic variation occurs to 
some degree in the coding regions of these sequences, and to a greater degree in the 

25 noncoding regions. Sulfatase sequences can be used to obtain such identification 

sequences from individuals and from tissue. The sequences represent unique fragments 
of the human genome. Each of the sequences described herein can, to some degree, be 
used as a standard agahist which DNA from an individual can be compaied for 
identification purposes. 

30 If a panel of reagents from the sequences is used to generate a unique 

identification database for an individual, those same reagents can later be used to identify 
tissue from that individual. Using the unique identification database, positive 
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identification of the individual, living or dead, can be made fi-om extremely small tissue 
samples. 

Sulfatase polynucleotides can also be used in forensic identification procedures. 
PGR technology can be used to amplify DNA sequences taken firom very small 
5 biological samples, such as a single hair follicle, body fluids (e.g. blood, saliva, or 
semen). The amplified sequence can then be compared to a standard allowing 
identification of the origin of the sample. 

Sulfatase polynucleotides can thus be used to provide polynucleotide reagents, 
e.g., PGR primers, targeted to specific loci in the human genome, which can enhance the 
1 0 reliabiUty of DN A-based forensic identifications by, for example, providing another 
"identification marker" (i.e. another DNA sequence that is unique to a particular 
individual). As described above, actual base sequence information can be used for 
identification as an accvirate alternative to patterns formed by restriction eirzyme 
generated fi-agments. Sequences taigeted to the noncoding region are particularly usefiil 
1 5 since greater polymorphism occurs in the noncoding regions, making it easier to 
differentiate individuals using this technique. 

Sulfatase polynucleotides can further be used to provide polynucleotide reagents, 
e.g., labeled or labelable probes which can be used in, for example, an in situ 
hybridization technique, to identify a specific tissue. This is useful in cases in which a 
20 forensic pathologist is presented with a tissue of unknown origin. Panels of sulfatase 
probes can be used to identify tissue by species and/or by organ type. 

In a similar fashion, these primers and probes can be used to screen tissue culture 
for contamination (i.e. screen for the presence of a mixture of different types of cells in a 
culture). 

25 Alternatively, sulfatase polynucleotides can be used directly to block 

transcription or translation of sulfatase gene sequences by means of antisense or 
ribozyme constmcts. Thus, in a disorder chai-acterized by abnormally high or 
undesirable sulfatase gene expression, nucleic acids can be directly used for treatment. 
Sulfatase polynucleotides are thus useful as antisense constmcts to control 

30 sulfatase gene expression in cells, tissues, and organisms. A DNA antisense 

polynucleotide is designed to be complementary to a region of the gene involved in 
transcription, preventing transcription and hence production of sulfatase protein. An 
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aiitisense RNA or DNA polynucleotide would hybridize to tlie mRNA and thus block 
translation of mRNA into sulfatase protein. 

Examples of antisense molecules useftd to inhibit nucleic acid expression include 
antisense molecules complementary to a fragment of the 5' untranslated region of SEQ 
5 ID NOS:2, 4, 6, or 8, which also includes the start codon and antisense molecules which 
are complementary to a fragment of the 3' unti-anslated region of SEQ ID NOS:2, 4, 6, or 
8. 

Alternatively, a class of antisense molecules can be used to inactivate mRNA in 
order to decrease expression of sulfatase nucleic acid. Accordingly, these molecules can 
1 0 treat a disorder characterized by abnormal or undesired sulfatase nucleic acid expression. 
This technique involves cleavage by means of ribozymes containing nucleotide 
sequences complementary to one or more regions in the mRNA that attenuate the ability 
of the mRNA to be translated. Possible regions include coding regions and particularly 
coding regions corresponding to the catalytic and other functional activities of the 
1 5 sulfatase protein. 

Sulfatase polynucleotides also provide vectors for gene therapy in patients 
containing cells that are aberrant in sulfatase gene expression. Thus, recombinant cells, 
which include the patient's cells that have been engineered ex vivo and returned to the 
patient, are introduced into an individual where the cells produce tlie deshred sulfatase 
20 protein to treat the individued. 

Tlie invention also encompasses kits for detecting the presence of a sulfatase 
nucleic acid in a biological sample. For example, the kit can comprise reagents such as a 
labeled or labelable nucleic acid or agent capable of detecting sulfatase nucleic acid in a 
biological sample; means for determining tiie amount of sulfatase nucleic acid in the 
25 saniple; and means for comparing the amount of sulfatase nucleic acid in the sample 
with a standard. The compound or agent can be packaged in a suitable container. The 
kit can further comprise instructions for using the kit to detect sulfatase mRNA or DNA. 

Computer Readable Means 
30 The nucleotide or amino acid sequences of the invention are also provided in a 

variety of mediums to facilitate use thereof. As used herein, "provided" refers to a 
manufacture, other than an isolated nucleic acid or amino acid molecule, which 
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contains a nucleotide or amino acid sequence of the present invention. Such a 
manufacture provides the nucleotide or amino acid sequences, or a subset thereof 
(e.g., a subset of open reading frames (ORPs)) in a form which allows a skilled artisan 
to examine the manufacture using means not directly applicable to examining the 

5 nucleotide or amino acid sequences, or a subset thereof, as they exists in nature or in 
purified form. 

In one application of this embodiment, a nucleotide or amino acid sequence of 
the present invention can be recorded on computer readable media. As used herein, 
"computer readable media" refers to any medium that can be read and accessed 

10 directly by a computer. Such media include, but are not limited to: magnetic storage 
media, such as floppy discs, hard disc storage medium, and magnetic tape; optical 
storage media such as CD-ROM; electrical storage media such as RAM and ROM; 
and hybrids of these categories such as magnetic/optical storage media. The skilled 
artisan will readily appreciate how any of the presently known computer readable 

1 5 mediums can be used to create a manufacture comprising computer readable medium 
having recorded thereon a nucleotide or amino acid sequence of the present invention. 

As used herein, "recorded" refers to a process for storing infomiation on 
computer readable medium. The skilled artisan can readily adopt any of the presently 
known methods for recording information on computer readable medium to generate 

20 manufactures comprising the nucleotide or amino acid sequence information of the 
present invention: 

A variety of data storage structures are available to a skilled artisan for 
creating a computer readable medium having recorded thereon a nucleotide or amino 
acid sequence of the present invention. The choice of the data storage structure will 

25 generally be based on the means chosen to access the stored information. In addition, 
a variety of data processor programs and formats can be used to store the nucleotide 
sequence information of the present invention on computer readable medium. The 
sequence information can be represented in a word processing text file, formatted in 
commercially-available software such as WordPerfect and Microsoft Word, or 

30 represented in the form of an ASCII file, stored in a database application, such as 

DB2, Sybase, Oracle, or the like. The skilled artisan can readily adapt any number of 
dataprocessor structuring formats (e.g., text file or database) in order to obtain 
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computer readable medium having recorded thereon the nucleotide sequence 
information of the present invention. 

By providing the nucleotide or amino acid sequences of tlie invention in 
computer readable form, the skilled artisan can routinely access the sequence 
5 information for a variety of purposes. For example, one skilled in the art can use the 
nucleotide or amino acid sequences of the invention in computer readable form to 
compare a target sequence or target structural motif with the sequence information 
stored witliin the data storage means. Search means are used to identify fragments or 
regions of the sequences of the invention which match a particular target sequence or 
10 target motif. 

As used herein, a "target sequence" can be any DNA or amino acid sequence 
of six or more nucleotides or two or more amino acids. A skilled artisan can readily 
recognize that the longer a target sequence is, the less likely a target sequence will be 
present as a random occurrence in the database. The most preferred sequence length 
15 of a target sequence is from about 1 0 to 1 00 amino acids or from about 3 0 to 3 00 
nucleotide residues. However, it is well recognized that commercially important 
fragments, such as sequence fragments involved in gene expression and protein 
processing, may be of shorter length. 

As used herein, "a target structural motif," or "target motif," refers to any 
20 rationally selected sequence or combination of sequences in which the sequence(s) are 
chosen based on a three-dimensional configuration which is formed upon the folding 
of the target motif. There are a variety of target motifs known in the art. Protein 
target motifs include, but are not limited to, enzyme active sites and signal sequences. 
Nucleic acid target motifs include, but are not limited to, promoter sequences, hairpin 
25 structures and inducible expression elements (protein binding sequences). 

Computer software is publicly available which allows a skilled artisan to 
access sequence information provided in a computer readable medium for analysis 
and comparison to other sequences. A variety of known algorithms are disclosed 
publicly and a variety of commercially available software for conducting search 
30 means are and can be used in the computer-based systems of the present invention. 
Examples of such software includes, but is not Umited to, MacPattem (EMBL), 
BLASTN and BLASTO (NCBIA). 
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For example, software which implements the BLAST (Altschul et al (1990) J. 
Mol Biol 2i5:403-410) and BLAZE (Brutlag et al (1993) Comp, Chem. 1 7:203-207) 
search algorithms on a Sybase system can be used to identify open reading frames 
(ORFs) of the sequences of the invention which contain homology to ORFs or 

5 proteins from other libraries. Such ORFs are protein encoding fragments and are 

usefril in producing commercially important proteins such as enzymes used in various 
reactions and in the production of commercially useftil metabolites. 

Vectors/Host Cells 

10 The invention also provides vectors contaiiiing sulfatase polynucleotides. The 

term "vector" refers to a vehicle, preferably a nucleic acid molecule that can transport 
sulfatase polynucleotides. When the vector is a nucleic acid molecule, the sulfatase 
polynucleotides are covalently linked to the vector.nucleic acid. With this aspect of tlie 
invention, the vector includes a plasmid, single or double stranded phage, a single or 
1 5 double stranded RNA or DNA viral vector, or artificial chromosome, such as a BAG, 
PACYAC, OR MAC. 

A vector can be maintained in tlie host cell as an extrachromosomal element 
where it replicates and produces additional copies of sulfatase polynucleotides. 
Altematively, the vector may integrate into the host cell genome and produce additional 
20 copies of sulfatase polynucleotides when the host cell replicates. 

The invention provides vectors for the maintenance (cloning vectors) or vectors 
for expression (e>qDression vectors) of sulfatase polynucleotides. The vectors can 
ftmction in prokaryotic or eukaryotic cells or in both (shuttle vectors). 

E5<pression vectors contain cis-acting regulatory regions that are operably linked 
25 in the vector to sulfatase jwlynucleotides such that transcription of the polynucleotides is 
allowed in a host cell. The polynucleotides can be introduced into the host cell with a 
separate polynucleotide capable of affecting transcription. Thus, the second 
polynucleotide may provide a trans-acting factor interacting with the cis-regulatory 
control region to allow transcription of sulfatase polynucleotides from the vector. 
30 Alternatively, a trans-acting factor may be supplied by the host cell. Finally, a ti-ans- 
acting factor can be produced from the vector itself. 
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It is imderstood, however, that in some embodiments, transcription and/or 
translation of sulfatase polynucleotides can occur in a cell-free system. 

The regulatory sequence to which the polynucleotides described herein can be 
operably linked include promoters for directing mRNA transcription. These include, but 
5 are not limited to, the left promoter from bacteriophage X, tlie lac, TRP, and TAG 

promoters from E. colU the early and late promoters from SV40, the CMV immediate 
early promoter, the adenovirus early and late promoters, and retrovirus long-terminal 
repeats. 

In addition to control regions that promote transcription, expression vectors may 
1 0 also include regions that modulate transcription, such as repressor binding sites and 

enhancers. Examples include the SV40 enhancer, the cytomegalovims immediate early 
enhancer, polyoma enhancer, adenovuiis enhancers, and retrovirus LTR enhancers. 

In addition to containing sites for transcription initiation and control, expression 
vectors can also contam sequences necessary for transcription termination and, in the 
1 5 transcribed region a ribosome binding site for translation. Other regulatory control 
elements for expression include initiation and termination codons as well as 
polyadenylation signals. The person of ordinaiy skill in the art would be aware of the 
numerous regulatory sequences that are useful in expression vectors. Such regulatory 
sequences are described, for example, in Sambrook et aL (1989) Molecular Cloning: A 
20 Laboratofy Manual 2nd ed.. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, 
NY). 

A variety of expression vectors can be used to express a sulfatase polynucleotide. 
Such vectors include chromosomal, episomal, and virus-derived vectors, for example 
vectors derived from bacterial plasmids, from bacteriophage, from yeast episomes, from 

25 yeast chromosomal elements, including yeast artificial chromosomes, from viruses such 
as baculoviruses, papovaviruses such as SV40, Vaccinia vimses, adenovimses, 
poxviruses, pseudorabies vimses, and retrovimses. Vectors may also be derived from 
combinations of these sources such as those derived from plasmid and bacteriophage 
genetic elements, e.g. cosmids and phagemids. Appropriate cloning and expression 

30 vectors for prokaryotic and eukaryotic hosts are described in Sambrook et aL (1989) 
Molecular Cloning: A Laboratory Manual 2nd ed. Cold Spring Harbor Laboratory 
Press, Cold Spring Harbor, NY. 
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The regulatory sequence may provide constitutive expression in one or more host 
cells (i.e. tissue specific) or may provide for inducible expression in one or more cell 
types such as by temperature, nutrient additive, or exogenous factor such as a hormone 
or other ligand. A variety of vectors providing for constitutive and inducible expression 
5 in prokaryotic and eulcaryotic hosts are well known to those of ordinary slcill in the art. 

Sulfatase polynucleotides can be mserted mto the vector nucleic acid by well- 
known methodology. Generally, the DNA sequence that will ultimately be expressed is 
joined to an expression vector by cleaving the DNA sequence and the expression vector 
with one or more restriction enzymes and then ligating the fragments together. 
1 0 Procedures for restriction enzyme digestion and Ugation are well known to those of 
ordinaiy skill in the art. 

The vector containing the appropriate polynucleotide can be introduced into an 
appropriate host cell for propagation or expression using well-known techniques. 
Bacterial cells include, but aie not limited to, E. colU Sti'eptomyces, and Salmonella 
1 5 typhimurhon, Eukaryotic cells include, but are not limited to, yeast, insect cells such as 
Drosophila, animal cells such as COS and CHO cells, and plant cells. 

As described herein, it may be desirable to express the polypeptide as a fusion 
protein. Accordingly, the invention provides fusion vectors that allow for the production 
of sulfatase polypeptides. Fusion vectors can increase the expression of a recombinant 
20 protein, increase the solubility of the recombinant protein, and aid in the purification of 
the protein by acting for example as a ligand for afiSnity purification. A proteolytic 
cleavage site may be introduced at the junction of the fusion moiety so that the desired 
polypeptide can ultimately be separated from the fusion moiety. Proteolytic enzymes 
include, but are not limited to, factor Xa, thrombin, and enterokinase. Typical fusion 
25 expression vectors include pGEX (Smith et ah (1 988) Gene 67:3 1-40), pMAL (New 
England Biolabs, Beverly, MA) and pRIT5 (Pharmacia, Piscataway, NJ) wliich fuse 
glutathione S-transferase (GST), maltose E bmding protein, or protein A, respectively, to 
the target recombinant protein. Examples of suitable inducible non-fiision E, coli 
expression vectors include pTrc (Amann et al (1988) Gene 5P:301-3 15) and pET lid 
30 (Studier et al (1990) Gene Expression Technology: Methods in Enzymology i<S5:60-89). 
Recombinant protein expression can be maximized in a host bacteria by 
providing a genetic background wherem the host cell has an impaired capacity to 
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proteolytically cleave the recombinant protein. (Gottesman, S. (1990) Gene Egression 
Technology: Methods in Enzymology 185, Academic Press, San Diego, California 119- 
128). 

It is further recognized that the nucleic acid sequences of the invention can be 
5 altered to contain codons, v^^hich are preferred, or non prefeixed, for a particular 
expression system. For example, the nucleic acid can be one in which at least one 
altered codon, and preferably at least 10%, or 20% of the codons have been altered 
such that the sequence is optimized for expression in E. coli, yeast, human, insect, or 
CHO cells. Methods for determining such codon usage are w^ell known in the art. 
1 0 Sulfatase polynucleotides can also be expressed by expression vectors that are 

operative in yeast. Examples of vectors for expression in yeast e.g., S. cerevisiae include 
pYepSecl (^^id^n et al (19%1) EMBO J, 5:229-234 ),pMFa(Kurjanef^7/. (1982) Ce// 
30:933-943), pJRY88 (Schultz et al (1987) Gene 54:1 13-123), and pYES2 (Invitrogen 
Corporation, San Diego, CA). 
1 5 Sulfatase polynucleotides can also be expressed in insect cells using, for 

example, baculovirus expression vectors. Baculovirus vectors available for expression 
of proteins in cultured insect cells (e.g., Sf 9 cells) include the pAc series (Smith et ah 
(1983) Mol Cell Biol 5:2156-2165) and the pVL series (Lucklow et al (1989) Virology 
770:31-39). 

20 In certain embodiments of the invention, the polynucleotides described herein are 

expressed in mammalian cells using mammalian expression vectors. Examples of 
mammalian expression vectors include pCDMS (Seed, B. (1987) Natw e 32P:840) and 
pMT2PC (Kaufinan et al (1987) EMBOJ. (5:187-195). 

The expression vectors listed herein are provided by way of example only of the 

25 well-known vectors available to those of ordinary skill in the art that would be useful to 
express sulfatase polynucleotides. The person of ordinary skill in the art would be aware 
of other vectors suitable for maintenance propagation or expression of the 
polynucleotides described herein. These are found for example in Sambrook et aL 
(1989) Molecular Cloning: A Laboratoiy Manual 2nd, ed, Cold Spring Harbor 

30 Laboratoiy, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, 

The invention also encompasses vectors in which the nucleic acid sequences 
described herein are cloned into the vector in reverse orientation, but operably linked to a 
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regulatory sequence that permits transcription of antisense RNA. Tlius, an antisense 
transcript can be produced to all, or to a portion, of the polynucleotide sequences 
described herein, including both coding and non-codmg regions. Expression of this 
antisense RNA is subject to each of the parameters described above in relation to 
5 expression of the sense RNA (regulatory sequences, constitutive or inducible expression, 
tissue-specific expression). 

The invention also relates to recombinant host cells containing the vectors 
described herein. Host cells therefore include prokaryotic cells, lower eukaryotic cells 
such as yeast, other eukaryotic cells such as insect cells, and higher eukaryotic cells such 
10 as mammalian cells. 

The recombinant host cells are prepared by introducing the vector constructs 
described herein into the cells by techniques readily available to the person of ordinary 
skill in the art. These include, but are not limited to, calcium phosphate transfection, 
DEAE-dextran-mediated transfection, cationic lipid-mediated transfection, 
1 5 electroporation, transduction, infection, lipofection, and other techniques such as fliose 
found in Sambrook et al {Molecular Cloning: A Laboratory Manual, 2d ed.. Cold 
Spring Harbor Laboratory, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, 
NY). 

Host cells can contain more than one vector. Thus, different nucleotide 
20 sequences can be introduced on different vectors of the same cell. Similarly, sulfatase 
polynucleotides can be introduced either alone or with other polynucleotides that are not 
related to sulfatase polynucleotides such as those providing trans-acting factors for 
expression vectors. When more than one vector is introduced into a cell, the vectors can 
be introduced independently, co-introduced or joined to the sulfatase polynucleotide 
25 vector. 

In the case of bacteriophage and viral vectors, these can be introduced into cells 
as packaged or encapsulated virus by standard procedures for infection and transduction. 
Viral vectors can be replication-competent or replication-defective. In the case in which 
viral replication is defective, replication will occur in host cells providing fimctions that 
30 complement the defects. 

Vectors generally include selectable markers that enable the selection of the 
subpopulation of cells that contain the recombinant vector constructs. The marker can 
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be contained in the same vector that contains the polynucleotides described herein or 
may be on a separate vector. Markers include tetracycline or ampicillin-resistance genes 
for prokaryotic host cells and dihydrofolate reductase or neomycin resistance for 
eukaryotic host cells. However, any marker that provides selection for a phenot3^ic trait 
5 will be effective. 

While the mature proteins can be produced in bacteria, yeast, mammalian cells, 
and other cells under the control of the appropriate regulatory sequences, cell-free 
transcription and translation systems can also be used to produce these proteins using 
RNA derived from the DNA constaicts described herein. 
1 0 Where secretion of the polypeptide is deshed, appropriate secretion signals are 

incorporated into tlie vector. The signal sequence can be endogenous to the sulfatase 
polypeptides or heterologous to these polypeptides. 

Where the polypeptide is not secreted into the medium, the protein can be 
isolated from the host cell by standard disraption procedures, including freeze thaw, 
1 5 sonication, mechanical dismption, use of lysing agents and the like. The polypeptide can 
then be recovered and purified by well-known purification methods including 
ammonivim sulfate precipitation, acid extraction, anion or cationic exchange 
cliromatography, phosphocellulose chromatography, hydrophobic-interaction 
chromatogi*aphy, affinity chromatography, hydroxylapatite chromatography, lectin 
20 chromatography, or high perfomiance liquid chromatography. 

It is also imderstood that depending upon the host cell in recombinant production 
of the polypeptides described herein, the polypeptides can have various glycosylation 
patterns, depending upon the cell, or maybe non-glycosylated as when produced in 
bacteria. In addition, the polypeptides may include an initial modified methionine in 
25 some cases as a result of a host-mediated process. 

Uses of Vectors and Host Cells 

It is understood that "host cells" and "recombinant host cells" refer not only to 
the particular subject cell but also to the progeny or potential progeny of such a cell. 
30 Because certain modifications may occur in succeeding generations due to either 

mutation or environmental influences, such progeny may not, in fact, be identical to 
the parent cell, but are still included within the scope of the term as used herein. A 
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"purified preparation of cells", as used herein, refers to, in the case of plant or animal 
cells, an in viti o preparation of cells and not an entire intact plant or animal. In the 
case of cultured cells or microbial cells, it consists of a preparation of at least 10% 
and more preferably 50% of the subject cells. 
5 The host cells expressing the polypeptides described herein, and particularly 

recombinant host cells, have a variety of uses. First, the cells aie useful for producing 
sulfatase proteins or polypeptides that can be further purified to produce desired amounts 
of sulfatase protein or firagments. Thus, host cells containing expression vectors are 
useful for polypeptide production. 
10 Host cells are also useful for conducting cell-based assays involving sulfatase or 

sulfatase fi-agments. Thus, a recombinant host cell expressing a native sulfatase is useful 
to assay for compounds that stimulate or inliibit sulfatase function, gene expression at 
the level of transcription or translation, and interaction with other cellular components. 
Host cells are also useful for identifying sulfatase mutants in which these 
15 functions are affected. If the mutants naturally occur and give rise to a patliology, host 
cells containing the mutations are useful to assay compounds that have a desired effect 
on the mutant sulfatase (for example, stimulating or inhibiting fimction) which may not 
be indicated by their effect on the native sulfatase. 

Recombinant host cells are also usefiil for expressing the chimeric polypeptides 
20 described herein to assess compounds that activate or suppress activation by means of a 
heterologous domain, segment, site, and the like, as disclosed herein. 

Further, mutant sulfatases can be designed in which one or more of the various 
functions is engineered to be increased or decreased and used to augment or replace 
sulfatase proteins in an individual. Thus, host cells can provide a therapeutic benefit by 
25 replacing an aberrant sulfatase or providing an aberrant sulfatase that provides a 

therapeutic result. In one embodiment, the cells provide sulfatases that are abnormally 
active. 

In another embodiment, the cells provide sulfatases that are abnormally inactive. 
These sulfatases can compete witli endogenous sulfatases in the individual. 
30 In another embodiment, cells expressing sulfatases that cannot be activated, are 

introduced into an individual in order to compete with endogenous sulfatases for 
substrate. For example, in the case in which excessive substrate or substrate analog is 
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part of a treatment modality, it may be necessary to effectively inactivate the substrate or 
substrate analog at a specific point in treatment. Providing cells that compete for tlie 
molecule, but which cannot be affected by sulfatase activation would be beneficial. 

Homologously recombinant host cells can also be produced that allow the in situ 
5 alteration of endogenous sulfatase polynucleotide sequences in a host cell genome. The 
host cell includes, but is not limited to, a stable cell Ime, cell in vivo, or cloned 
microorganism. This technology is more Mly described in WO 93/09222, WO 
91/12650, WO 91/06667, U.S. 5,272,071, and U.S. 5,641,670. Briefly, specific 
polynucleotide sequences corresponding to the sulfatase polynucleotides or sequences 
1 0 proximal or distal to a sulfatase gene are allowed to integrate into a host cell genome by 
homologous recombination where expression of the gene can be affected, hi one 
embodiment, regulatory sequences are introduced that either increase or decrease 
expression of an endogenous sequence. Accordingly, a sulfatase protein can be 
produced in a cell not normally producing it. Alternatively, increased expression of 
1 5 sulfatase protein can be effected in a cell normally producing the protein at a specific 
level. Further, expression can be decreased or eliminated by introducing a specific 
regulatory sequence. The regulatory sequence can be heterologous to the sulfatase 
protein sequence or can be a homologous sequence ^vith a desired mutation tliat affects 
expression. Alternatively, the entire gene can be deleted. The regulatory sequence can 
20 be specific to the host cell or capable of fimctioning in more than one cell type. Still 
fiarther, specific mutations can be introduced into any desired region of the gene to 
produce mutant sulfatase proteins. Such mutations could be introduced, for example, 
into the specific fimctional regions such as the peptide substrate-binding site. 

In one embodiment, the host cell can be a fertilized oocyte or embryonic stem 
25 cell that can be used to produce a transgenic animal containing the altered sulfatase gene. 
Alternatively, the host cell can be a stem cell or otlier early tissue precursor that gives 
rise to a specific subset of cells and can be used to produce transgenic tissues in an 
animal. See also Thomas et al. Cell 57:503 (1987) for a description of homologous 
recombination vectors. The vector is introduced into an embryonic stem cell line (e.g., 
30 by electroporation) and cells in which the introduced gene has homologously 

recombined with the endogenous sulfatase gene is selected (see e.g., Li, E. et ah (1992) 
Cell 5P:915). The selected cells are then mjected into a blastocyst of an animal (e.g., a 
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mouse) to form aggregation chimeras (see e.g., Bradley, A. in Teratoccircinomas and 
Embryonic Stem Cells: A Practical Approach, E.J. Robertson, ed. (IRL, Oxford, 1987) 
pp. 1 13-1 52). A chimeric embryo can then be implanted into a suitable pseudopregnant 
female foster animal and the embryo brought to teim. Progeny harboring the 
5 homologously recombined DNA in their germ cells can be used to breed animals in 
which all cells of the animal contain tlie homologously recombined DNA by germline 
transmission of the transgene. Methods for constructing homologous recombination 
vectors and homologous recombinant animals are described further in Bradley, A. 
(1991) Current Opinions in Biotechnology 2:823-829 and in PCT International 
10 Publication Nos. WO 90/11354; WO 91/01140; and WO 93/04169. 

The genetically engineered host cells can be used to produce non-human 
transgenic animals. A transgenic animal is preferably a mammal, for example a rodent, 
such as a rat or mouse, in which one or more of the cells of the animal include a 
transgene. A tiansgene is exogenous DNA which is integrated into the genome of a cell 
15 from which a transgenic animal develops and which remains in tlie genome of the 
matui-e animal in one or more cell types or tissues of the transgenic animal. These 
animals are useful for studying the function of a suifatase protein and identifying and 
evaluating modulators of suifatase protein activity. 

Other examples of transgenic animals include non-human primates, sheep, dogs, 
20 cows, goats, chickens, and amphibians. 

In one embodiment, a host cell is a fertilized oocyte or an embryonic stem cell 
into which svilfatase polynucleotide sequences have been introduced. 

A transgenic animal can be produced by introducing nucleic acid into the male 
pronuclei of a fertilized oocyte, e.g., by microinjection, retroviral infection, and allowing 
25 the ooc5^e to develop in a pseudopregnant female foster animal. Any of the suifatase 
nucleotide sequences can be introduced as a transgene into the genome of a non-human 
animal, such as a mouse. 

Any of the regulatory or other sequences useful in expression vectors can form 
part of the transgenic sequence. Tliis includes intronic sequences and polyadenylation 
30 signals, if not akeady included. A tissue-specific regulatory sequence(s) can be operably 
linked to the transgene to direct expression of the suifatase protein to particular cells. 
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Methods for generating transgenic animals via embryo manipulation and 
microinjection, particularly animals such as mice, have become conventional in the art 
and are described, for example, in U.S. Patent Nos. 4,736,866 and 4,870,009, both by 
Leder aL, U.S. Patent No. 4,873,191 by Wagner et ah and in Hogan, B., Manipidating 
5 the Mouse Embiyo, (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N. Y., 
1986). Similar methods are used for production of other transgenic animals. A 
transgenic founder animal can be identified based upon the presence of the transgene in 
its genome and/or expression of transgenic mRNA in tissues or cells of the animals. A 
transgenic foxmder animal can tiien be used to breed additional animals carrying the 
1 0 transgene. Moreover, transgenic animals carrying a transgene can furttier be bred to 
other transgenic animals carrying other transgenes. A transgenic animal also includes 
animals in which the entire animal or tissues in the animal have been produced using the 
homologously recombinant host cells described herein. 

In another embodiment, transgenic non-human animals can be produced which 
1 5 contain selected systems, which allow for regulated expression of the transgene. One 
example of such a system is the cre/loxP recombinase system of bacteriophage PI . For 
a description of the cre/loxP recombinase system, see, e.g., Lakso et aL (1992) PNAS 
5P:6232-6236. Another example of a recombinase system is the FLP recombinase 
system of S cerevisiae (O'Gorman etaL (1991) Science 257:1351-1355. If a cre/loxP 
20 recombinase system is used to regulate expression of the transgene, animals containing 
transgenes encoding both the Cre recombinase and a selected protein is required. Such 
animals can be provided through the construction of "double" transgenic animals, e.g., 
by mating two transgenic animals, one containing a transgene encoding a selected 
protein and the other containing a transgene encoding a recombinase. 
25 Clones of the non-human transgenic animals described herein can also be 

produced according to the methods described in Wilmut et aL (1997) Nature 355:810- 
813 and PCT Intemational Publication Nos. WO 97/07668 and WO 97/07669. In brief, 
a cell, e.g., a somatic cell, fi-om the transgenic animal can be isolated and induced to exit 
the growth cycle and enter Go phase. The quiescent cell can then be fused, e.g., thiough 
30 the use of electrical pulses, to an enucleated oocyte from an animal of the same species 
from which the quiescent cell is isolated. The reconstructed oocyte is then cultured such 
that it develops to morula or blastocyst and then transferred to a pseudopregnant female 
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foster animal. The offspring bom of this female animal will be a clone of the animal 
from which the cell, e.g., the somatic cell, is isolated. 

Transgenic animals containing recombinant cells that express the polypeptides 
described herein are useful to conduct tlie assays described herein in an in vivo context, 
5 Accordingly, the various physiological factors that are present in vivo and that could 
affect binding or activation, may not be evident from in vitro cell-free or cell-based 
assays. Accordingly, it is useftil to provide non-human transgenic animals to assay in 
vivo sulfatase function, including peptide interaction, the effect of specific mutant 
sulfatases on sulfatase function and peptide interaction, and the effect of chimeric 
1 0 sulfatases. It is also possible to assess the effect of null mutations, that is mutations that 
substantially or completely eliminate one or more sulfatase functions. 

In general, methods for producing transgenic animals include introducing a 
nucleic acid sequence according to the present invention, the nucleic acid sequence 
capable of expressiug the protein in a transgenic animal, into a cell in culture or in 
15 vivo. When introduced in vivo, the nucleic acid is introduced into an intact organism 
such that one or more cell types and, accordingly, one or more tissue types, express 
the nucleic acid encoding the protein. Alternatively, the nucleic acid can be 
introduced into virtually all cells in an organism by transfecting a cell in culture, such 
as an embryonic stem cell, as described herein for the production of transgenic 
20 animals, and this cell can be used to produce an enture transgenic organism. As 

described, in a further embodiment, the host cell can be a fertilized oocyte. Such cells 
are then allowed to develop in a female foster animal to produce the transgenic 
organism. 

25 Pharmaceutical Compositions 

Sulfatase nucleic acid molecules, proteins, modulators of the protein, and 
antibodies (also referred to herein as "active compoimds") can be incorporated into 
pharmaceutical compositions suitable for administration to a subject, e.g., a human. 
Such compositions typically comprise the nucleic acid molecule, protein, modulator, or 
30 antibody and a pharmaceutically acceptable carrier. 

The term "administer" is used in its broadest sense and includes any method of 
introducing the compositions of the present invention into a subject. This includes 
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producing polypeptides or polynucleotides in vivo by in vivo transcription or translation 
of polynucleotides that have been exogenously introduced into a subject. Thus, 
polypeptides or nucleic acids produced in the subject from the exogenous compositions 
are encompassed in the term "administer." 
5 As used herein the language "phamiaceutically acceptable caiTier" is uitended to 

include any and all solvents, dispersion media, coatings, antibacterial and antifiingal 
agents, isotonic and absorption delaying agents, and die like, compatible with 
pharmaceutical administration. The use of such media and agents for phamiaceutically 
active substances is well known in the art. Except insofar as any conventional media or 
1 0 agent is incompatible with the active compound, such media can be used in the 
compositions of the invention. Supplementary active compounds can also be 
incorporated into the compositions. A pharmaceutical composition of the invention is 
formulated to be compatible with its intended route of administration. Examples of 
routes of administration mclude parenteral, e.g., intravenous, intradermal, subcutaneous, 
1 5 oral (e.g., inhalation), transdermal (topical), transmucosal, and rectal administration. 
Solutions or suspensions used for parenteral, intradermal, or subcutaneous application 
can include the following components: a sterile diluent such as water for injection, saline 
solution, fixed oils, polyethylene glycols, glycerine, propylene glycol or other synthetic 
solvents; antibacterial agents such as benzyl alcohol or methyl parabens; antioxidants 
20 such as ascorbic acid or sodium bisulfite; chelating agents such as 

ethylenediaminetetraacetic acid; buffers such as acetates, citrates or phosphates and 
agents for the adjustment of tonicity such as sodium chloride or dextrose. pH can be 
adjusted with acids or bases, such as hydrochloric acid or sodium hydroxide. The 
parenteral preparation can be enclosed in ampules, disposable syringes or multiple dose 
25 vials made of glass or plastic. 

Phamiaceutical compositions suitable for injectable use include sterile aqueous 
solutions (where water soluble) or dispersions and sterile powders for the 
extemporaneous preparation of sterile injectable solutions or dispersion. For intravenous 
administration, suitable carriers include physiological saline, bacteriostatic water, 
30 Cremoplior EL^" (BASF, Parsippany, NJ) or phosphate buffered saline (PBS). In all 
cases, the composition must be sterile and should be fluid to the extent that easy 
syringability exists. It must be stable under the conditions of manufacture and storage 
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and must be preserved against the contaminating action of microorganisms such as 
bacteria and fungi. Tlie carrier can be a solvent or dispersion medivmi containing, for 
example, water, ethanol, polyol (for example, glycerol, propylene glycol, and liquid 
polyethylene glycol, and the like), and suitable mixtures thereof. The proper fluidity can 
5 be maintained, for example, by the use of a coating such as lecitliin, by the maintenance 
of the requii'ed particle size in the case of dispersion and by the use of surfactants. 
Prevention of the action of microorganisms can be achieved by various antibacterial and 
antifungal agents, for example, parabens, chlorobutanol, phenol, ascorbic acid, 
thimerosal, and the like. In many cases, it vsdU be preferable to include isotonic agents, 
1 0 for example, sugars, polyalcohols such as mannitol, sorbitol, sodium chloride in the 

composition. Prolonged absorption of the injectable compositions can be brought about 
by including in the composition an agent which delays absorption, for example, 
aluminum monostearate and gelatin. 

Sterile injectable solutions can be prepared by incorporating the active 
1 5 compound (e.g., a sulfatase protein or anti-sulfatase antibody) in the required amount in 
an appropriate solvent with one or a combination of ingredients enumerated above, as 
required, followed by filtered sterihzation. Generally, dispersions are prepared by 
incorporating the active compound into a sterile veliicle which contains a basic 
dispersion mediirai and the required other ingredients from those enumerated above. In 
20 the case of sterile powders for the preparation of sterile injectable solutions, the preferred 
methods of preparation are vacuimi drying and freeze-drying which yields a powder of 
the active ingredient plus any additional desired ingredient from a previously sterile- 
filtered solution thereof. 

Oral compositions generally include an inert diluent or an edible carrier. They 
25 can be enclosed in gelatin capsules or compressed into tablets. For oral administration, 
the agent can be contained in enteric forms to survive the stomach or further coated or 
mixed to be released in a particular region of the GI tract by laiown methods. For the 
purpose of oral therapeutic administration, the active compound can be incorporated 
with excipients and used in the form of tablets, troches, or capsules. Oral compositions 
30 can also be prepared using a fluid carrier for use as a mouthwash, wherein the compound 
in the fluid carrier is applied orally and swished and e?q)ectorated or swallowed. 
Pharmaceutically compatible binding agents, and/or adjuvant materials can be included 
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as part of the composition. The tablets, pills, capsules, troches and the like can contain 
any of the following ingredients, or compounds of a similar nature: a binder such as 
microcrystalline cellulose, giam tragacanth or gelatin; an excipient such as starch or 
lactose, a disintegrating agent such as alginic acid, Primogel, or com starch; a lubricant 
5 such as magnesium stearate or Sterotes; a glidant such as colloidal silicon dioxide; a 
sweetening agent such as sucrose or saccharin; or a flavoring agent such as peppennint, 
methyl salicylate, or orange flavoring. 

For administration by inhalation, the compounds are delivered in the fonn of an 
aerosol spray from pressured container or dispenser, which contains a suitable 
10 propellant, e.g., a gas such as carbon dioxide, or a nebulizer. 

Systemic administration can also be by transmucosal or transdermal means. For 
transmucosal or transdermal administration, penetrants appropriate to the barrier to be 
pemieated are used in the formulation. Such penetrants are generally known in the art, 
and include, for example, for transmucosal administration, detergents, bile salts, and 
1 5 fusidic acid derivatives. Transmucosal administration can be accomplished througli the 
use of nasal sprays or suppositories. For transdemial administration, the active 
compounds are formulated into ointments, salves, gels, or creams as generally known in 
the art. 

The compounds can also be prepared in the form of suppositories (e.g., with 
20 conventional suppository bases such as cocoa butter and other glycerides) or retention 
enemas for rectal delivery. 

In one embodiment, the active compounds are prepared with carriers that will 
protect the compound against rapid elimination from the body, such as a controlled 
release formulation, including implants and microencapsulated delivery systems. 
25 Biodegradable, biocompatible polymers can be used, such as ethylene vinyl acetate, 
polyanhydrides, polyglycolic acid, collagen, polyorthoesters, and polylactic acid. 
Methods for preparation of such fomiulations will be apparent to those skilled in the art. 
The materials can also be obtained commercially from Alza Corporation and Nova 
Pharmaceuticals, Inc. Liposomal suspensions (including liposomes targeted to infected 
30 cells with monoclonal antibodies to viral antigens) can also be used as pharmaceutically 
acceptable carriers. These can be prepared according to methods known to those skilled 
in the art, for example, as described in U.S. Patent No. 4,522,81 1 . 
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It is especially advantageous to formulate oral or parenteral compositions in 
dosage unit form for ease of administration and imifonnity of dosage. "Dosage imit 
form" as used herein refers to physically discrete units suited as unitary dosages for the 
subject to be treated; each unit containing a predetermined quantity of active compound 
5 calculated to produce tlie desired therapeutic effect in association with the required 

pharmaceutical carrier. The specification for the dosage unit forms of the iavention are 
dictated by and directly dependent on tlie unique characteristics of the active compound 
and the particular therapeutic effect to be achieved, and the limitations inlierent in the art 
of compounding such an active compound for the treatment of individuals. 
1 0 The nucleic acid molecules of the invention can be inserted into vectors and used 

as gene therapy vectors. Gene therapy vectors can be delivered to a subject by, for 
example, intravenous injection, local administration (U.'S. 5,328,470) or by stereotactic 
injection (see e.g., Chen et ah (1994) PNAS 91 :3054-3057). The pharmaceutical 
preparation of the gene therapy vector can include the gene therapy vector in an 
1 5 acceptable diluent, or can comprise a slow release matrix in which tlie gene delivery 
vehicle is imbedded. Alternatively, where the complete gene delivery vector can be 
produced intact from recombinant cells, e.g. retroviral vectors, the pharmaceutical 
preparation can include one or more cells which produce the gene delivery system. 

As defined herein, a therapeutically effective amount of protein or polypeptide 
20 (i.e., an effective dosage) ranges from about 0.001 to 30 mg/kg body weight, 

preferably about 0.01 to 25 mg/kg body weight, more preferably about 0.1 to 20 
mg/kg body weight, and even more preferably about 1 to 10 mg/kg, 2 to 9 mg/kg, 3 to 
8 mg/kg, 4 to 7 mg/kg, or 5 to 6 mg/kg body weight. 

The skilled artisan will appreciate that certain factors may influence the 
25 dosage required to effectively treat a subject, including but not limited to the severity 
of the disease or disorder, previous treatments, the general health and/or age of the 
subject, and other diseases present. Moreover, treatment of a subject with a 
therapeutically effective amount of a protein, polypeptide, or antibody can include a 
smgle treatment or, preferably, can include a series of treatments. In a preferred 
30 example, a subject is treated with antibody, protein, or polypeptide in the range of 
between about 0.1 to 20 mg/kg body weight, one time per week for between about 1 
to 10 weeks, preferably between 2 to 8 weeks, more preferably between about 3 to 7 
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weeks, and even more preferably for about 4, 5, or 6 weeks. It will also be 
appreciated tliat the effective dosage of antibody, protein, or polypeptide used for 
treatment may increase or decrease over the course of a particular treatment. Changes 
in dosage may result and become apparent from the results of diagnostic assays as 
5 described herein. 

The present invention encompasses agents which modulate expression or 
activity. An agent may, for example, be a small molecule. For example, such small 
molecules include, but are not limited to, peptides, peptidomimetics, amino acids, 
amino acid analogs, polynucleotides, polynucleotide analogs, nucleotides, nucleotide 
10 analogs, organic or inorganic compoimds (i.e., including heteroorganic and 

organometallic compoxands) having a molecular weight less than about 10,000 grams 
per mole, organic or inorganic compounds having a molecular weight less tlian about 
5,000 grams per mole, organic or inorganic compomids having a molecular weight 
less than about 1,000 grams per mole, organic or inorganic compounds having a 
15 molecular weight less than about 500 grams per mole, and salts, esters, and other 
phamiaceutically acceptable forms of such compounds. 

It is understood that appropriate doses of small molecule agents depends upon 
a number of factors within the ken of the ordinarily skilled physician, veterinarian, or 
researcher. The dose(s) of the small molecule will vary, for example, depending upon 
20 the identity, size, and condition of the subject or sample being treated, further 
depending upon the route by which the composition is to be administered, if 
apphcable, and the effect which the practitioner desires the small molecule to have 
upon the nucleic acid or polypeptide of the invention. Exemplary doses include 
milUgram or microgram amounts of the small molecule per kilogram of subject or 
25 sample weight (e.g., about 1 microgram per kilogram to about 500 milligrams per 

kilogram, about 100 micrograms per kilogram to about 5 milligrams per kilogram, or 
about 1 microgram per kilogram to about 50 micrograms per kilogram. It is 
furthermore understood that appropriate doses of a small molecule depend upon the 
potency of the small molecule with respect to the expression or activity to be 
30 modulated. Such appropriate doses may be determined using the assays described 

herein. When one or more of these small molecvdes is to be administered to an animal 
(e.g., a human) in order to modulate expression or activity of a polypeptide or nucleic 
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acid of the invention, a physician, veterinarian, or researcher may, for example, 
prescribe a relatively low dose at first, subsequently increasing the dose until an 
appropriate response is obtained. In addition, it is understood that the specific dose 
level for any particular animal subject will depend upon a variety of factors including 

5 the activity of the specific compound employed, tlie age, body weight, general health, 
gender, and diet of the subject, tlie time of administration, the route of administration, 
the rate of excretion, any drug combination, and the degree of expression or activity to 
be modulated. 

The pharmaceutical compositions can be included in a container, pack, or 
1 0 dispenser together with instructions for administration. 



Other Embodiments 

In another aspect, the invention features, a method of analyzing a plurality of 
capture probes. The method can be used, e.g., to analyze gene expression. The 

15 method includes: providing a two dimensional array having a plurality of addresses, 
each address of the plurality being positionally distinguishable from each other 
address of the plurality, and each address of the plurality having a unique capture 
probe, e.g., a nucleic acid or peptide sequence; contacting the array with a 22438, 
23553, 25278, or 26212 nucleic acid, preferably purified, polypeptide, preferably 

20 purified, or antibody, and thereby evaluating the plurality of capture probes. Binding, 
e.g., in the case of a nucleic acid, hybridization with a capture probe at an address of 
the plurality, is detected, e.g., by signal generated from a label attached to the 22438, 
23553, 25278, or 26212 nucleic acid, polypeptide, or antibody. 

The capture probes can be a set of nucleic acids firom a selected sample, e.g., a 

25 sample of nucleic acids derived fi-om a control or non-stimulated tissue or cell. 

The method can include contacting the 22438, 23553, 25278, or 26212 nucleic 
acid, polypeptide, or antibody with a first array having a plurality of capture probes 
and a second array having a different plurality of capture probes. The results of each 
hybridization can be compared, e.g., to analyze differences in expression between a 

30 first and second sample. The first plurality of capture probes can be from a control 
sample, e.g., a wild type, normal, or non-diseased, non-stimulated, sample, e.g., a 
biological fluid, tissue, or cell sample. The second plurality of capture probes can be 
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from an experimental sample, e.g., a mutant type, at risk, disease-state or disorder- 
state, or stimulated, sample, e.g., a biological fluid, tissue, or cell sample. 

The plurality of capture probes can be a plurality of nucleic acid probes each 
of which specifically hybridizes with an allele of 22438, 23553, 25278, or 26212. 
5 Such methods can be used to diagnose a subject, e.g., to evaluate risk for a disease or 
disorder, to evaluate suitability of a selected treatment for a subject, to evaluate 
whether a subject has a disease or disorder. 22438, 23553, 25278, or 26212 are 
associated with sulfatase activity, thus it is useful for disorders associated witli 
abnormal sulfatase activity. 
10 The method can be used to detect SNPs, as described below. 

In another aspect, the invention features, a method of analy^ng a plurality of 
probes. The method is useful, e.g., for analyzing gene expression. The method 
includes: providing a two dimensional array having a plurality of addresses, each 
address of the plurality being positionally distinguishable from each other address of 
15 the plurality having a unique capture probe, e.g., wherein the capture probes are from 
a cell or subject which express or misexpress 22438, 23553, 25278, or 26212, or from 
a cell or subject in which a 22438, 23553, 25278, or 26212 mediated response has 
been elicited, e.g., by contact of the cell with 22438, 23553, 25278, or 26212 nucleic 
acid or protein, or administration to the cell or subject 22438, 23553, 25278, or 26212 
20 nucleic acid or protein; contacting the array with one or more inquiry probe, wherein 
an inquiry probe can be a nucleic acid, polypeptide, or antibody (which is preferably 
other than 22438, 23553, 25278, or 26212 nucleic acid, polypeptide, or antibody); 
providing a two dimensional array having a plurality of addresses, each address of the 
plurality being positionally distinguishable from each other address of the plurality, 
25 and each address of the plurality having a unique capture probe, e.g., wherein the 
capture probes are from a cell or subject which does not express 22438, 23553, 
25278, or 26212 (or does not express as highly as in the case of the 22438, 23553, 
25278, or 26212 positive plurality of capture probes) or from a cell or subject wliich 
in which a 22438, 23553, 25278, or 26212 mediated response has not been elicited (or 
30 has been elicited to a lesser extent than in the first sample); contacting the array with 
one or more inquiry probes (which is preferably other than a 22438, 23553, 25278, or 
26212 nucleic acid, polypeptide, or antibody), and thereby evaluating the plurality of 
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capture probes. Binding, e.g., in the case of a nucleic acid, hybridization with a 
capture probe at an address of the plurality, is detected, e.g., by signal generated from 
a label attached to the nucleic acid, polypeptide, or antibody. 

In anotlier aspect, the invention features a method of analyzing 22438, 23553, 
5 25278, or 26212, e.g., analyzing structure, function, or relatedness to other nucleic 
acid or amino acid sequences. The method includes: providing a 22438, 23553, 
25278, or 26212 nucleic acid or amino acid sequence; comparing the 22438, 23553, 
25278, or 26212 sequence with one or more preferably a plurality of sequences from a 
collection of sequences, e.g., a nucleic acid or protein sequence database; to thereby 

10 analyze 22438, 23553, 25278, or 26212. 

Preferred databases include GenBank™. The metliod can include evaluating 
the sequence identity between a 22438, 23553, 25278, or 26212 sequence and a 
database sequence. The method can be performed by accessing the database at a 
second site, e.g., over the internet. 

15 In another aspect, the invention features, a set of oligonucleotides, useful, e.g., 

for identifying SNP's, or identifying specific alleles of 22438, 23553, 25278, or 
26212. The set includes a plurality of oligonucleotides, each of which h£is a different 
nucleotide at an interrogation position, e.g., an SNP or the site of a mutation. In a 
prefen-ed embodiment, the oligonucleotides of the plurality identical in sequence with 

20 one another (except for differences in length). The oligonucleotides can be provided 
with different labels, such that an oligonucleotides which hybridizes to one allele 
provides a signal that is distinguishable from an oligonucleotides which hybridizes to 
a second allele. 

This invention is further illustrated by the following examples which should 
25 not be construed as limiting. The contents of all references, patents and published 
patent applications cited throughout this application are incorporated herein by 
reference. 
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EXAMPLES 

5 Example 1: Identification and Characterization of Human 22438 cDNAs 

The human 22438 sequence (Figure lA-B; SEQ ID N0:2), which is 
approximately 2175 nucleotides long including untranslated regions, contains a 
predicted methionine-initiated coding sequence of about 1578 nucleotides 
(nucleotides 248-1825 of SEQ ID NO:2; SEQ ID NO:ll). The coding sequence 
1 0 encodes a 525 amino acid protein (SEQ ID NO: 1 ). 

PFAM analysis indicates that 22438 contains a sulfatase domain. For general 
information regarding PFAM identifiers, PS prefix and PF prefix domain 
identification numbers, refer to Sonnhammer et al (1997) Protein 28:405-420 and 
http//\\nArw,psc.edu/general/software/packages/pfan^ 
15 As used herein, the term "sulfatase domain" includes an amino acid sequence 

of about 80-420 amino acid residues in length and having a bit score for the alignment 
of the sequence to the sulfatase domain (HMM) of at least 8. Preferably, a sulfatase • 
domain includes at least about 100-250 amino acids, more preferably about 130-200 
amino acid residues, or about 160-200 amino acids and has a bit score for the 
20 alignment of the sequence to the sulfatase domain (HMM) of at least 16 or greater. 
The sulfatase domain (HMM) has been assigned the PFAM Accession PF00884 
(http;//pfam.wustLedu/) . An alignment of the sulfatase domain (amino acids 36-462 
of SEQ ID NO: 1) of human 22438 with a consensus amino acid sequence derived 
from a hidden Markov model is depicted in Figure 19. 
25 In a preferred embodiment 22438-like polypeptide or protein has a "sulfatase 

domain" or a region v^hich includes at least about 100-250, more preferably about 
130-200 or 160-200, amino acid residues and has at least about 60%, 70%, 80%, 90%, 
95%, 99%, or 100% sequence identity Math a "sulfatase domain," e.g., the sulfatase 
domain of human 22438-like polypeptide or protein (e.g., amino acid residues 36-462 
30 ofSEQIDNO:!). 

To identify the presence of an "sulfatase" domain in a 22438-like protein 
sequence, and make the determination that a polypeptide or protein of interest has a 
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particular profile, the amino acid sequence of the protein can be searched against a 
database of HMMs (e.g., the Pfam database, release 2.1) using the default parameters 
(h1tp://wvvrw.sanger.ac.uk/Software/Pfam/HMM_se^^ For example, the hmmsf 
program, which is available as part of the HMMER package of search programs, is a 
5 family specific default program for MILPAT0063 and a score of 15 is the default 
threshold score for determining a hit. Ahematively, the thieshold score for 
determining a hit can be lowered (e.g., to 8 bits). A description of the Pfam database 
can be found in Sonhammer et al (1997) Proteins 28(3yAQ5AlQ and a detailed 
description of HMMs can be found, for example, in Gribskov et al (1990) Meth 
10 Enzymol 755:146-159; Gribskov et al (1987) Proc, Natl Acad. Set USA 84:4355- 
4358; Krogh et al (1994) J. Mol Biol 235:1501-1531; and Stultz et al (1993) 
Protein Sci. 2:305-3 14, the contents of which are incorporated herein by reference. 

Example 2: Tissue Distribution of 22348 mRNA 

Northern blot hybridizations with various RNA samples are perfonned under 
standard conditions and washed under stringent conditions, i.e., 0.2 X SSC at 65°C. 
A DNA probe corresponding to all or a portion of the 22348 cDNA (SEQ ID NO:2) 
can be used. The DNA is radioactively labeled with 32p-dCTP using tlie Prime-It Kit 
(Stratagene, La JoUa, CA) according to the instmctions of the supplier. Filters 
containing mRNA from mouse hematopoietic and endocrine tissues, and cancer cell 
lines (Clontech, Palo Alto, CA) are probed in ExpressHyb hybridization solution 
(Clontech) and washed at high stringency according to manufacturer's- 
recommendations. 

25 Example 3: Identification and Characterization of Human 23553 cDNAs 

The human 23553 sequence (Figure 5A-B; SEQ ID N0:4), which is 
approximately 4321 nucleotides long including untranslated regions, contains a 
predicted methionine-initiated coding sequence of about 2616 nucleotides 
(nucleotides 510-3125 of SEQ ID NO:4; SEQ ID NO:12). The coding sequence 
30 encodes a 871 amino acid protein (SEQ ID NO:3), 

PFAM analysis indicates that 23553 has a svilfatase domain. For general 
information regarding PFAM identifiers, PS prefix and PF prefix domam 
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identification numbers, refer to Sonnhanimer et al (1997) Protein 28:405-420 and 
http//vvww.psc.edij/generaysofhvare/packages/pfam/pfam.^ An alignment of the 
sulfatase domain (amino acids 43 to 467 of SEQ ID NO:3) of human 23553-Uke with 
a consensus amino acid sequence derived firom a hidden Markov model is depicted in 
5 Figure 20. For further information on sulfatase domains, see Example 1 . 

In one embodiment, a 23553-like protein includes at least one transmembrane 
domain. As used herein, the term "transmembrane domain" includes an amino acid 
sequence of about 15 amino acid residues in length that spans a phospholipid 
membrane. More preferably, a transmembrane domain includes about at least 18, 20, 
10 22, or 24 amino acid residues and spans a phospholipid membrane. Transmembrane 
domains are rich in hydrophobic residues, and typically have an a-helical structure. 
In a preferred embodunent, at least 50%, 60%, 70%, 80%, 90%, 95% or more of the 
amino acids of a transmembrane domain are hydrophobic, e.g., leucines, isoleucines, 
tyrosines, or tiyptophans. Transmembrane domains are described in, for example, 
1 5 http://pfam.wustl.edu/cgi-bin/getdesc?name=7tm-l , and Zagotta W.N. et al (1996) 
Animal Rev. Neuronsci. 7P:235-63, tlie contents of which are incorporated herein by 
reference. 

In a preferred embodiment, a 23553-like polypeptide or protein has at least 
one transmembrane domain or a region which includes at least 18, 20, 22, or 24 ammo 

20 acid residues and has at least about 60%, 70% 80% 90% 95%, 99%, or 100% 

sequence identity with a "transmembrane domain," e.g., at least one transmembrane 
domain of human 23553 (e.g., amino acid residues 7 to 25 of SEQ ID NO:3). 

In another embodiment, a 23553 protein includes at least one "non- 
transmembrane domain." As used herem, "non-transmembrane domaius" are 

25 domains that reside outside of the membrane. When referring to plasma membranes, 
non-transmembrane domains include extracellular domains (i.e., outside of the cell) 
and intracellular domains (i.e., within the cell). When referring to membrane-bound 
proteins found in intracellular organelles (e.g., mitochondria, endoplasmic reticulimi, 
peroxisomes and microsomes), non-transmembrane domains include those domains of 

30 the protein that reside in the cytosol (i.e., the cytoplasm), the lumen of the organelle, 
or the matrix or the intermembrane space (the latter tsvo relate specifically to 
mitochondria organelles). The C-terminal amino acid residue of a non-transmembrane 
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domain is adjacent to an N-terminal amino acid residue of a transmembrane domain 
in a naturally occurring 23553-like protein. 

In a preferred embodiment, a 23553-like polypeptide or protein has a '*non- 
transmembrane domain" or a region which includes at least about 1-350, preferably 
5 about 200-320, more preferably about 230-300, and even more preferably about 240- 
280 amino acid residues, and has at least about 60%, 70% 80% 90% 95%, 99% or 
100% sequence identity with a "non-transmembrane domain", e.g., a non- 
transmembrane domain of human 23553-like protein. 

A non-transmembrane domain located at the N-terminus of a 23553-like 
10 protein or polypeptide is referred to herein as an "N-temiinal non-transmembrane 
domain." As used herein, an "N-terminal non-transmembrane domain" includes an 
amino acid sequence having about 1-100. For example, an N-terminal non- 
transmembrane domain is located at about amino acid residues 1 to 6 of SEQ ID 
NO:3. 

15 Similarly, a non-transmembrane domain located at the C-terminus of a 23553- 

like protein or polypeptide is referred to herein as a "C-terminal non-transmembrane 
domain." As used herein, a "C-terminal non-transmembrane domain" includes an 
amino acid sequence having about 1-800, preferably about 15-500, preferably about 
20-270, more preferably about 25-255 amino acid residues in length and is located 

20 outside the boundaries of a membrane. For example, a C-terminal non- 
transmembrane domain is located at about amino acid residues 26-871 of SEQ ID 
NO:3. 

The ORF analyzer predicts that 23553 has a signal peptide. Therefore, a 
23553-like molecule can further include a signal sequence. As used herein, a "signal 

25 sequence" refers to a peptide of about 20-80 amino acid residues in length which 
occurs at the N-terminus of secretory and integral membrane proteins and which 
contains a majority of hydrophobic amino acid residues. For example, a signal 
sequence contains at least about 12-25 amino acid residues, preferably about 30-70 
amino acid residues, and has at least about 40-70%, preferably about 50-65%, and 

30 more preferably about 55-60% hydrophobic amino acid residues (e.g., alanine, valine, 
leucine, isoleucine, phenylalanine, tyrosine, tryptophan, or proline). Such a "signal 
sequence", also referred to in the art as a "signal peptide", serves to direct a protein 

-96- 



015541 1A2_I_> 



wo 01/5541 1 PCT/USOl/03266 
containing such a sequence to a lipid bilayer. For example, in one embodiment, a 
23553-like protein contains a signal sequence of about amino acids 1-22 of SEQ ID 
NO:3. The "signal sequence" is cleaved during processing of the mature protein. The 
mature 23553-like protein corresponds to amino acids 23-871 of SEQ ID NO:3. 
5 CLUSTAL multiple sequence alignment analysis shows homology between 

23553 and the following sequences (identified by GenBank accession number): 
P14217, Chlamydomonas reinhardtii arylsulfatase; Q10723, Volvox carteri 
arylsulfatase; CAB40661, human N-acetylglucosamine-6-sulfatase homolog; PI 5586, 
human N-acetylglucosamine-6-sulfatase; P50426, goat N-acetylglucosamine-6- 
10 sulfatase; AAA83618, C. elegans putative sulfatase; AAC02716, Neiirospora crassa 
arylsulfatase; P31447, E. coli hypothetical sulfatase. 



Example 4: Tissue Distribution of 23553 mRNA 

In normal human tissues tested, high expression of 23553 was observed in 

15 trachea, vein, osteoblast, kidney, and testes. Significant expression of 23553 was 
found in adipose, colon, skeletal muscle, thyroid, prostate, and other tissues. See 
Figure 25. In comparisons of normal and tumor tissue, 23553 expression was detected 
in all samples tested, with increased expression in breast, colon, and lung tumors. See 
Figure 26. Furtlier, elevated expression of 23553 was found in glioblastoma samples, 

20 as compared to normal brain tissue samples. Expression levels were determined by 
quantitative PGR (Taqman® brand quantitative PGR kit. Applied Biosy stems). The 
quantitative PGR reactions were performed according to the kit manufacturer's 
instructions. 

cDNA library array analysis of 23553 revealed expression in adipose, adrenal 
25 gland, bone, brain, colon, colon metastases to liver, endothelial, heart, liver, lung, 
muscle, osteoblast, skin, testes, thyroid, and other tissue. Reverse transcriptase 
polymerase chain reaction (RT-PCR) revealed 23553 expression in clinical samples of 
normal and tumor colon tissue, normal and metastatic liver tissue, and in lung 
squamous cell carcinoma tissue. In situ hybridization showed expression of 23553 in 
30 the following tissues: 3 of 3 breast tumor; 0 of 2 normal breast; 4 of 4 lung tumor; 0 
of 2 normal lung; 4 of 4 colon tumor; and 2 of 2 liver metasteses. Li all cases. 
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expression of 23553 was confined to the stromal component of tissue; no expression 
was detected in noiinal or tumor epithelium. 

Angiogenic growth factors (e.g., bFGF) are present in the extracellular matrix 
(ECM), and can be released from the ECM by heparinase-like enzymes. This 
5 includes the glycosyl-sulfatases. The released growth factors in turn stimulate blood 
vessel formation. See Baird A, Ling N., "Fibroblast growth factors are present in the 
extracellular matrix produced by endothelial cells in vitro: implications for a role of 
heparinase-like enzymes in the neovascular response," Biochem Biophys Res 
Commtm, (1987) 142(2):428-35. 

10 As noted, 23553 has amino acid sequence features that place it in the class of 

glycosyl sulfate cleaving enzymes. Taqman results (above) show that its expression 
is elevated in clinical tumor samples. In situ hybridization shows specific, localized 
23553 expression in the tmnor stromal component of all tumor samples tested, 
whereas its expression is low or absent in normal tissues. This suggests that, through 

15 catalytic activity, 23553 promotes tumor growth or is involved in tumor maintenance 
by degrading the ECM and releasing growth factors. 

Example 5: Identification and Characterization of Human 25278 cDNAs 

The human 25278 sequence (Figure lOA-B; SEQ ID NO:6), which is 

20 approximately 2940 nucleotides long including untranslated regions, contains a 
predicted methionine-initiated coding sequence of about 1710 nucleotides 
(nucleotides 334-2043 of SEQ ID NO:6; SEQ ID NO: 13). The coding sequence 
encodes a 569 amino acid protein (SEQ ID NO:5). 

PFAM analysis indicates that 25278 has a sulfatase domain. For general 

25 information regarding PFAM identifiers, PS prefix and PF prefix domain 

identification numbers, refer to Sonnhammer et al (1997) Protein 28:405-420 and 
http//www.psc.edu/general/software/packages/pfani/pfani.html . An alignment of the 
sulfatase domain (amino acids 47 to 471 of SEQ ID NO:5) of human 25278 with a 
consensus amino acid sequence derived from a hidden Markov model is depicted in 

30 Figure 27. For fiirther information on sulfatase domains, see Example 1 . 
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Example 6: Identification and Characterizatioa of Human 26212 cDNAs 

The human 26212 sequence (Figure 15; SEQ ID NO:8), which is 
approximately 2253 nucleotides long including untranslated regions, contains a 
5 predicted methionine-initiated coding sequence of about 1800 nucleotides 
(nucleotides 324-2123 of SEQ ID NO:8; SEQ ID NO:14). The coding sequence 
encodes a 599 amino acid protein (SEQ ID NO:7). 

PFAM analysis indicates tliat 26212 has a sulfatase domain. For general 
information regarding PFAM identifiers, PS prefix and PF prefix domain 
10 identification numbers, refer to Sonnhammer et al (1997) Protein 28:405-420 and 
http//wvsnv.psc.ed\i/general/sofl:ware/packages/pfam/pfam.htinl . An alignment of the 
sulfatase domain (amino acids 76-502 of SEQ ID NO:7) of human 26212 with a 
consensus amino acid sequence derived fi-om a hidden Markov model is depicted in 
Figure 29. For further information on sulfatase domains, see Example 1. 
15 In one embodiment, 26212-like protein includes at least one. transmembrane 

domain. As used herein, the term "transmembrane domain" includes an amino acid 
sequence of about 15 amino acid residues in length that spans a phospholipid 
membrane. More preferably, a transmembrane domain includes about at least 18, 20, 
22, or 24 amino acid residues and spans a phospholipid membrane. For more 
20 information on transmembrane domains, see example 3. 

In a preferred embodiment, a 26212-like polypeptide or protein has at least 
one transmembrane domain or a region which includes at least 1 8, 20, 22, 24, 25, or 
30 amino acid residues and has at least about 60%, 70% 80% 90% 95%, 99%, or 
100% sequence identity with a "transmembrane domain," e.g., at least one 
25 transmembrane domain of human 26212-like polypeptide or protein (e.g., amino acid 
residues 24 to 44 of SEQ ID NO:7). 

In another embodiment, a 26212-like protein includes at least one "non- 
transmembrane domain." The C-terminal amino acid residue of a non-transmembrane 
domain is adjacent to an N-terminal amino acid residue of a transmembrane domain 
30 in a naturally occurring 26212-like protein. For more infomiation on non- 
transmembrane domains, see Example 3. 
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In a preferred embodiment, a 26212-like pol)^peptide or protein has a "non- 
transmembrane domain" or a region which includes at least about 1-350, preferably 
about 200-320, more preferably about 230-300, and even more preferably about 240- 
280 amino acid residues, and has at least about 60%, 70% 80% 90% 95%, 99% or 
5 100% sequence identity with a "non-transmembrane domain", e.g., a non- 

transmembrane domain of human 26212-like polypeptide or protein. An N-terminal 
non-transmembrane domain is located at about amino acid residues 1 to 23 of SEQ ID 
NO:7. A C-terminal non-transmembrane domain is located at about amino acid 
residues 45 to 599 of SEQ ID NO:7. 
10 A 26212-like molecule can further include a signal sequence. For more 

information on signal sequences, see Example 3. 



Example 7: Tissue Distribution of 26212 nxRN A 

In six independent experiments, 26212 showed higher levels of expression in 

15 proliferating endothelial cells as compared to anested endothelial cells. 26212 
expression was also higher in proliferating endothelial cells than in non-endothelial 
cells. See Figure 30, 26212 expression levels were upregulated in breast tissue cell 
lines treated with epideimal groAvth factor, as well. See Figure 34. 26212 is expressed 
in hemangiomas and other angiogenic tissues, including fetal heart, uterine 

20 adenocarcinoma, and endometrial polyps. See Figure 35. Endothelial and glial cells 
showed higher levels of 26212 expression as compared to other tissues and cells. See 
Figure 36. 26212 also showed higher levels of expressing in some lung, breast and 
brain tumors as compared to normal tissues. Expression levels of 26212 were found 
to be higher in proliferating endothelial cells than in tumors, too. Expression levels 

25 were determined by quantitative PGR (Taqman® brand quantitative PGR kit. Applied 
Biosystems). The quantitative PGR reactions were performed according to the kit 
manufacturer's instructions. 

In situ hybridization analysis was also carried out. 26212 showed weak 
expression in ovarian tumor, and no expression in normal ovary. Similarly, colon 

30 metastases showed weak expression of 26212, and normal colon tissue and primary 
tumors showed no expression. A subset of lung tumors tested showed expression of 
26212, while no expression was revealed in normal lung. 
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Angiogenic growth factors (e.g., bFGF) are present in the extracellular matrix 
(ECM), and can be released from tlie ECM by heparinase-like enTymes. This 
includes the glycosyl-sulfatases. The released growth factors in turn stimulate blood 
vessel formation by, e.g., attracting endothelial cells to form new vessels. See Baird 
5 A, Ling N., "Fibroblast growth factors are present in the exti^acellular matrix produced 
by endothelial cells in vitro: implications for a role of heparinase-like enzymes in the 
neovascular response," Biochem Biophys Res Commun, (1987) 142(2):428-35. 

As noted, 26212 has amino acid sequence features that place it in the class of 
glycosyl sulfate cleaving enzymes. Taqman results (above) show that its expression 
10 is elevated in proliferating endothelial cells, suggesting that 26212 is specifically 
involved in active angiogenic sites. 

Example 8: Recombinant Expression of 22348. 23553, 25278, or 26212 in Bacterial 
Cells 

15 In this example, 22348, 23553, 25278, or 26212 is expressed as a recombinant 

glutathione-S-transferase (GST) fusion polypeptide in E, coli and the fusion 
polypeptide is isolated and characterized. Specifically, 22348, 23553, 25278, or 
26212 is fused to GST and this fusion polypeptide is expressed in E. coli, e.g., strain 
PEB199. Expression of the GST-26212 fusion protein in PEB199 is induced with 

20 IPTG. The recombinant fusion polypeptide is purified from crude bacterial lysates of 
the induced FEB 199 strain by afFmity chromatography on glutathione beads. Using 
polyacrylamide gel electrophoretic analysis of the polypeptide purified from the 
bacterial lysates, the molecular weight of the resultant fusion polypeptide is 
determined. 

25 

Example 9: Expression of Recombinant 22348, 23553, 25278, or 26212 Protem in 
COS Cells 

To express the 22348, 23553, 25278, or 26212 gene in COS cells, the 
pcDNA/Amp vector by Invitrogen Corporation (San Diego, CA) is used. This vector 
30 contains an SV40 origin of replication, an ampicillin resistance gene, an E, coli 
replication origin, a CMV promoter followed by a polylinker region, and an SV40 
intron and polyadenylation site. A DNA fragment encoding tlie entire 22348, 23553, 
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25278, or 26212 proteiii and an HA tag (Wilson et al (1984) Cell ?>l-n61) or a FLAG 
tag fiised in-frame to its 3' end of the fragment is cloned into the poly linker region of 
the vector, thereby placing the expression of the recombinant protein under the 
control of the CMV promoter. 
5 To construct the plasmid, tlie 22348, 23553, 25278, or 26212 DNA sequence 

is amplified by PGR using two primers. The 5' primer contains the restriction site of 
interest followed by approximately twenty nucleotides of the 22348, 23553, 25278, or 
26212 coding sequence starting from the initiation codon; the 3' end sequence 
contains complementary sequences to the other restriction site of interest, a translation 
10 stop codon, the HA tag or FLAG tag and the last 20 nucleotides of the 22348, 23553, 
25278, or 26212 coding sequence. The PGR amplified fragment and the 
pGDNA/Amp vector are digested with the appropriate restriction en2ymes and the 
vector is dephosphorylated using the GIAP enzyme (New England Biolabs, Beverly, 
MA). Preferably the two restriction sites chosen are different so that the 22348, 
15 23553, 25278, or 26212 gene is inserted in the correct orientation. The ligation 
mixture is transformed into E. coli cells (stiains HBlOl, DH5a, SURE, available 
from Stratagene Gloning Systems, La JoUa, GA, can be used), the transformed culture 
is plated on ampicillin media plates, and resistant colonies are selected. Plasmid 
DNA is isolated from transformants and examined by restriction analysis for the 
20 presence of the correct fragment, 

COS cells are subsequently transfected with the 22348, 23553, 25278, or 
26212-pcDNA/Amp plasmid DNA losing the calcium phosphate or calcium chloride 
co-precipitation methods, DEAE-dextran-mediated transfection, lipofection, or 
electroporation. Other suitable methods for transfecting host cells can be found in 
25 Sambrook, J., Fritsh, E. F., and Maniatis, T. Molecular Cloning: A Laboratory 
Manual 2nd, ed, Cold Spring Harbor Laboratory, Cold Spring Harbor Laboratoiy 
Press, Cold Spring Harbor, NY, 1989. The expression of the 22348, 23553, 25278, or 
26212 polypeptide is detected by radiolabelling (35S-methionine or ^^S-cysteine 
available from NEN, Boston, MA, can be used) and immunoprecipitation (Harlow, E. 
30 and Lane, D. Antibodies: A Laboratory Manual, Cold Spring Harbor Laboratory 
Press, Cold Spring Harbor, NY, 1988) using an HA specific monoclonal antibody. 
Briefly, the cells are labeled for 8 hours with 35S-methionine (or 35s-cysteine). The 
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culture media are then collected and the cells are lysed using detergents (RIP A buffer, 
150 mM NaCl, 1% NP-40, 0.1% SDS, 0.5% DOC, 50 mM Tris, pH 7.5). Botli the 
cell lysate and the culture media are precipitated with an HA specific monoclonal 
antibody. Precipitated polypeptides are then analyzed by SDS-PAGE. 
5 Alternatively, DNA containing the 22348, 23553, 25278, or 26212 coding 

sequence is cloned directly into the polylinker of the pCDNA/Amp vector using the 
appropriate restriction sites. The resulting plasniid is transfected into COS cells in the 
manner described above, and tlie expression of the 22348, 23553, 25278, or 26212 
polypeptide is detected by radiolabelling and immxmoprecipitation using a 22348, 
10 23553, 25278, or 26212 specific monoclonal antibody. 

This invention may be embodied in many different forms and should not be 
construed as limited to the embodiments set forth herein; rather, these embodiments are 
provided so that tliis disclosure will fully convey the invention to those skilled in the art. 
Many modifications and other embodiments of the invention will come to mind in one 
1 5 skilled in the art to which this invention pertauis having the benefit of the teachings 

presented in the foregoing description. Although specific terms are employed, they are 
used as in the art unless otherwise indicated. 
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Applicant's or agent's 




International application No. 




file reference 


35800/208709 


PCT/US01/ 





INDICATIONS RELATING TO DEPOSITED MICROORGANISM 
OR OTHER BIOLOGICAL MATERIAL 

(PCI Rule 13Jb/s) 



A The Indications made below relate to the deposited microorganism or other biological material referred to in the description on page 5. line 31 



B. IDENTIFICATION OF DEPOSIT 



Further deposits are identified on an additional sheet \Z\ 



Name of depository institution 

American Type Cxilture Collection 



Address of depositary institution (including postal code and country) 

10801 University Blvd. 
Manassas, VA 201 10-2209 US 



Date of deposit 



Accession Number 



PTA- 



C. ADDITIONAL INDICATIONS (leave blank if not applicable) 



This information is continued on an additional sheet [Zl 



Page 17, line 12; page 22. line 9; page 23, line 23; page 108, lines 7, 13, 17, 21, 24 and 29; page 109, 
lines 8 and 13; page 110, lines 2. 6, 13 and 22; page 111, lines 1, 6, 9 and 13. 



D. DESIGNATED STATES FOR WHICH INDICATIONS ARE MADE (if the indicators are not for all designated States) 



E. SEPARATE FURNISHING OF INDICATIONS (leave blank if not applicable) 



The indications listed below will be submitted to the Intemational Bureau later (specify the general nature of the indications e.g., "Accession 
Number of Deposit") 

Accession Number of Deposit and Date of Deposit 



Ej^Tl 



For receiving Office use only 



Ttiis sheet was received witti the intemational application 
Authorized offic^i^^ a3BI»)]^ SR 




For International Bureau use only 



O This sheet was received with the International Bureau on: 



Authorized officer 
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Applicant's or agent's 
file reference 



35800/208709 



International application No. 



PCT/US01/ 



INDICATIONS RELATING TO DEPOSITED WllCROORGANISWl 
OR OTHER BIOLOGICAL MATERIAL 

(POT Rule 13Ws) 



A. The indications made below relate to the deposited microorganism or other biological material referred to in the description on page 5, line 31 


B. IDENTIFICATION OF DEPOSIT 


Further deposits are Identified on an additional sheet □ 


Name of depository institution 

AmericaD Type Cxilture Collection 


Address of depositary institution (including postal code and country) 




10801 University Blvd. 
Manassas, VA 20110-2209 US 




Date of deposit 

05 April 2000 (05.04.00) 


Accession Number 

PTA-1639 


C. ADDITIONAL INDICATIONS (leave blank if not applicable) 


This infomialion is continued on an additional sheet □ 



Page 17, line 12; page 22, line 10; page 23, line 23; page 108. lines 7, 13, 17, 21, 24 and 29; page 109, 
lines 8 and 1 3; page 1 1 0. lines 2, 6. 1 3 and 22; page 111, lines 1 . 6, 9 and 1 3. 



D. DESIGNATED STATES FOR WHICH INDICATIONS ARE MADE (if f/ie indicators are not for ail designated States) 



E. SEPARATE FURNISHING OF INDICATIONS (leave blank if not applicable) 



The indications listed below will be submitted to the International Bureau later (spedfy the general nature of the indications e.g., "Accession 
Number of Deposit) 



I 



For receiving Office use only 



This sheet was received with the International application 




For International Bureau use only 



Q This sheet was received writh the International Bureau on: 



Authorized officer 



FonnPCT/RO/134 (July 1998) 
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Applicant's or agent's 
file reference 



35800/208709 



International application No. 



PCT/US01/ 



INDICATIONS RELATING TO DEPOSITED WIICROORGANISM 
OR OTHER BIOLOGICAL MATERIAL 

(PCTRule13Ws) 



A. The indications made below relate to the deposited microorganism or other biological material referred to in the description on page 5, line 31 


B. IDENTIFICATION OF DEPOSIT 


Further deposits are identified on an additional sheet O 


Name of depository institution 

American Type Culture Collection 


Address of depositary institution (including postal code and country) 




10801 University Blvd. 
Manassas, VA 201 10-2209 US 




Date of deposit 

09 May 2000 (09.05.00) 


Accession Number 

PTA- 1846 


C. ADDITIONAL INDICATIONS (leave blank if not applicable) 


This information is continued on an additional sheet □ 


Page 17, line 12; page 22, line 10; page 23. line 23; page 108, lines 7, 13. 17. 21, 24 and 29; page 109. 
lines 8 and 13; page 110, lines 2, 6, 13 and 22; page 111, lines 1, 6, 9 and 13. 



D. DESIGNATED STATES FOR WHICH INDICATIONS ARE MADE (if tf)e indicators are not for all designated States) 



E. SEPARATE FURNISHING OF INDICATIONS (leave blank if not applicable) 



The indications listed below will be submitted to the Intemational Bureau later (specify tfie general nature oftfie indications e.g., "Accession 
Number of Deposit^ 




For receiving Office use only 



This sheet was received with the International application 



For International Bureau use only 



\Z\ This sheet was received with the Intemational Bureau on: 



Authorized officer 



FormPCT/RO/134 (July 1998) 
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Applicant's or agent's 

file reference 35800/208709 



International application No. 



PCT/US01/ 



INDICATIONS RELATING TO DEPOSITED MICROORGANISM 
OR OTHER BIOLOGICAL MATERIAL 

(PCTRulelSWs) 



A. The indications made below relate to the deposited microorganism or other biological material refen-ed to in the description on page 5, line 32 



B. IDENTIFICATION OF DEPOSIT 



Further deposits are identified on an additional sheet CH 



Name of depository institution 

American Type Cultiire Collection 



Address of depositary institution (including postal code and country) 

10801 University Blvd. 
Manassas, V A 20 1 1 0-2209 US 



Date of deposit 



Accession Number 



PTA- 



C. ADDITIONAL INDICATIONS (leave blank if not applicable) 



This information is continued on an additional sheet Q 



Page 17, line 12; page 22. line 10; page 23, line 23; page 108, lines 8, 13, 17, 21, 24 and 29; page 109, 
lines 9 and 13; page 1 10, lines 2, 6, 13 and 22; page 111, lines 2, 6, 9 and 13. 



D. DESIGNATED STATES FOR WHICH INDICATIONS ARE MADE (iftfie indicators are not for all designated States) 



E. SEPARATE FURNISHING OF INDICATIONS (leave blank if not applicable) 



The Indications listed below will be submitted to the International Bureau later (specify the general nature of the indications e.g., "Accession 
Number of Deposit") 

Accession Number of Deposit and Date of Deposit 




For receiving Office use only 



This sheet was received with the international application 
Authorized officer jyi^J^jM Su £ 



For International Bureau use only 



□ This sheet was received with the International Bureau on: 



Authorized officer 



Form PCT/RO/134 (July 1998) 
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THAT WHICH IS CLAIMED: 
1 . An isolated nucleic acid molecule selected from the group consisting 

of: 

a) a nucleic acid molecule comprising a nucleotide sequence 
5 which is at least 60% identical to the nucleotide sequence of SEQ ID NO: 2, 4, 6, 8, 
1 1, 12, 13, or 14, or the nucleotide sequence of the cDNA insert of the plasmid 

deposited with ATCC as Patent Deposit Number , PTA-1639, PTA-1846, or 

, wherein said nucleotide sequence encodes a polypeptide having biological 

activity; 

10 b) a nucleic acid molecule comprismg a fragment of at least 20 

nucleotides of the nucleotide sequence of SEQ ID NO: 2, 4, 6, 8, 11, 12, 13, or 14, or 
the nucleotide sequence of the cDNA insert of the plasmid deposited with ATCC as 
Patent Deposit Number , PTA-1639, PTA-1846, or 

c) a nucleic acid molecule which encodes a polypeptide 

15 comprising the amino acid sequence of SEQ ID NO:l, 3, 5, or 7, or the amiao acid 
sequence encoded by the cDNA insert of the plasmid deposited with the ATCC as 
Patent Deposit Number , PTA-1639, PTA-1846, or ; 

d) a nucleic acid molecule which encodes a fragment of a 
polypeptide comprising the amino acid sequence of SEQ ID NO:l, 3, 5, 7, or the 

20 ammo acid sequence encoded by the cDNA insert of the plasmid deposited with the 

ATCC as Patent Deposit Number , PTA-1639, PTA-1846, or , wherem the 

fragment comprises at least 15 contiguous amino acids of SEQ ID NO:l, 3, 5, or 7, or 
the amino acid sequence encoded by the cDNA insert of the plasmid deposited with 
the ATCC as Patent Deposit Number , PTA-1639, PTA-1846, or ; 

25 e) a nucleic acid molecule which encodes a natm-ally occurring 
allehc variant of a biologically active polypeptide comprising the amino acid 
sequence of SEQ ID NO:l, 3, 5, or 7, or the amino acid sequence encoded by the 
cDNA insert of the plasmid deposited with the ATCC as Patent Deposit Number 
, PTA-1639, PTA-1846, or , wherein the nucleic acid molecule hybridizes 

30 to a nucleic acid molecule comprising the complement of SEQ ID NO:2, 4, 6, 8, 1 1, 
12, 13, or 14 imder stringent conditions; and 
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a nucleic acid molecule comprising the complement of a), b). 



2. The isolated nucleic acid molecule of claim 1 , which is selected from 

5 the group consisting of: 

a) a nucleic acid molecule comprising the nucleotide sequence of 
SEQ ID NO:2, 4, 6, 8, 1 1, 12, 13, 14, the cDNA insert of any one the plasmids 

deposited with ATCC as Patent Deposit Number , PTA-1639, PTA-1846, or 

, or a complement thereof; and 

10 b) a nucleic acid molecule which encodes a polypeptide 

comprising the amino acid sequence of SEQ ID NO:l, 3, 5, or 7, or an. amino acid 
sequence encoded by the cDNA insert of any of the plasmids deposited with ATCC as 
Patent Deposit Number , PTA-1 639, PTA-1 846, or . 

15 3. The nucleic acid molecule of claim 1 further comprising vector nucleic 

acid sequences. 

4. The nucleic acid molecule of claim 1 further comprising nucleic acid 
sequences encoding a heterologous polypeptide. 

20 

5 . A host cell which contains the nucleic acid molecule of claim 1 . 

6. The host cell of claim 5 which is a mammalian host cell. 

25 7. A nonhuman mammalian host cell containing the nucleic acid 

molecule of claim 1 . 

8. An isolated polypeptide selected from the group consisting of: 

a) a biological active polypeptide which is encoded by a nucleic 
30 acid molecule comprising a nucleotide sequence which is at least 60% identical to a 
nucleic acid comprising the nucleotide sequence of SEQ ID NO: 2, 4, 6, 8, 11, 12, 13, 

- 109- 



BNSDOCID: <WO 015541 1A2J_> 



wo 01/55411 



PCT/LS(n/03266 



or 14 or the nucleotide sequence of the cDNA insert of the plasmid deposited with 

ATCC as Patent Deposit Number , PTA-1639, PTA-1846, or ; 

b) a naturally occurring allelic variant of a polypeptide comprising 
the amino acid sequence of SEQ ID NO:l, 3, 5, or 7, or the amino acid sequence 

5 encoded by the cDNA insert of the plasmid deposited with the ATCC as Patent 

Deposit Number , PTA-1639, PTA-I846, or , wherein the polypeptide is 

encoded by a nucleic acid molecule which hybridizes to a nucleic acid molecule 
comprising the complement of SEQ ID NO: 2, 4, 6, 8, 1 1, 12, 13, or 14 under 
stringent conditions; and, 

10 c) a fragment of a polypeptide comprising the amino acid 

sequence of SEQ ID NO: 1 , 3, 5, or 7, or the amino acid sequence encoded by the 
cDNA insert of the plasmid deposited with the ATCC as Patent Deposit Number 

, PTA-1639, PTA-1846, or , wherein tlie fragment comprises at least 15 

contiguous amino acids of SEQ ID NO:l, 3, 5, or 7; and 

15 d) a polypeptide having at least 60% sequence identity to the 

amino acid sequence SEQ ID NO:l, 3, 5, or 7, wherein the polypeptide has biological 



20 sequence of SEQ ID NO: 1 , 3, 5, or 7, or an amino acid sequence encoded by the 

cDNA insert of any of the plasmids deposited with ATCC as Patent Deposit Number 
, PTA-1639, PTA-1846, or 



activity. 



9. 



The isolated polypeptide of claim 8 comprising the amino acid 



10. The polypeptide of claim 8 fiirther comprising heterologous amino 



25 



acid sequences. 



11. 



An antibody which selectively binds to a polypeptide of claim 8. 



12. A method for producing a polypeptide selected from the group 
30 consisting of: 

a) a polypeptide comprising the amino acid sequence of SEQ ID 
NO: 1, 3, 5, or 7, or the amino acid sequence encoded by the cDNA insert of the 
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plasmid deposited with the ATCC as Patent Deposit Number , PTA-1639, PTA- 

1846, or ; 

b) a polypeptide comprising a fragment of the amino acid 
sequence of SEQ ID NO:l, 3, 5, or 7, or the amino acid sequence encoded by the 

5 cDNA insert of the plasmid deposited with the ATCC as Patent Deposit Nimiber 

, PTA-1639, PTA-1846, or , wherein the fragment comprises at least 15 

contiguous amino acids of SEQ ID NO:l, 3, 5, or 7, or the amino acid sequence 
encoded by the cDNA insert of the plasmid deposited with the ATCC as Patent 
Deposit Number , PTA-1639, PTA-1846, or ; 

10 c) a biologically active naturally occurring allelic variant of a 

polypeptide comprising the amino acid sequence of SEQ ID NO:l, 3, 5, or 7, or the 
amino acid sequence encoded by the cDNA insert of the plasmid deposited with the 

ATCC as Patent Deposit Number , PTA-1639, PTA-1846, or , wherein the 

polypeptide is encoded by a nucleic acid molecule which hybridizes to a nucleic acid 

15 molecule comprising the complement of SEQ ID NO:2, 4, 6, 8, 1 1, 12, 13, or 14; 

d) a polypeptide having at least 60% sequence identity to the 
amino acid sequence of SEQ ID NO:l, 3, 5, or 7, wherein said polypeptide has 
biological activity; 

comprising culturLng the host cell of claim 5 under conditions in which the nucleic 
20 acid molecule is expressed. 

13. The method of claim 12 wherein said polypeptide comprises the amino 
acid sequence of SEQIDNO:l, 3, 5, or 7. 

25 14. A method for detecting the presence of a polypeptide of claim 8 in a 

sample, comprising: 

a) contacting the sample with a compound which selectively binds 
to a polypeptide of claim 8; and 

b) determining whether the compoxmd binds to the polypeptide in 

30 the sample. 
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15. The method of clahn 14, wherein the compoxmd which binds to the 
polypeptide is an antibody. 



16. A kit comprising a compound which selectively binds to a polypeptide 
5 of claim 8 and instructions for use. 

17. A method for detecting the presence of a nucleic acid molecule of 
claim 1 in a sample, comprising the steps of: 

a) contacting the sample with a nucleic acid probe or primer 
10 which selectively hybridizes to the nucleic acid molecule; and 

b) determining whether the nucleic acid probe or primer binds to a 
nucleic acid molecule in the sample. 

1 8. The method of claim 1 7, wherein the sample comprises mRNA 
1 5 molecules and is contacted with a nucleic acid probe. 

19. A kit comprising a compound which selectively hybridizes to a nucleic 
acid molecule of claim 1 and instructions for use. 

20 20. A method for identifying a compound which binds to a polypeptide of 

claim 8 comprising the steps of: 

a) contacting a polypeptide, or a cell expressing a polypeptide of 
claim 8 with a test compound; and 

b) detemiining whether the polypeptide binds to the test 

25 compound. 

2 1 . The method of claim 20, wherein the binding of the test compound to 
the polypeptide is detected by a method selected from the group consisting of: 

a) detection of binding by . direct detecting of test 
30 compound/polypeptide binding; 

b) detection of binding using a competition binding assay; 

c) detection of binding using an assay for sulfatase activity. 
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22. A method for modulating the activity of a polypeptide of claim 8 
comprising contacting a polypeptide or a cell expressing a polypeptide of claim 8 with 
a compound which binds to the polypeptide in a sufficient concentration to modulate 

5 the activity of the polypeptide. 

23 . A method for identifying a compoimd which modulates the activity of 
a polypeptide of claim 8, comprising: 

a) contacting a polypeptide of claim 8 with a test compoxmd; and 
1 0 b) determining the effect of the test compound on the activity of 

the polypeptide to thereby identify a compound which modulates the activity of the 
polypeptide. 

24. A method for identifying an agent that modulates the level of 

15 expression of a nucleic acid molecule of claim 1 in a cell, said method comprising 
contacting said agent with the cell expressing said nucleic acid molecule such that 
said level of expression of said nucleic acid molecule can be modulated in said cell by 
said agent and measuring said level of expression of said nucleic acid molecule. 

20 25 . A method for modulating the level of expression of a nucleic acid 

molecule of claim 1, said method comprising contacting said nucleic acid molecule 
with an agent under conditions that allow the agent to modulate the level of 
expression of the nucleic acid molecule. 

25 26. A pharmaceutical composition containing any of the polypeptides in 

claim 8 in a pharmaceutically acceptable carrier. 
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Sequence length 2175 
CAOQCxkcCGCMATTICCTCAT^^ 
ATAGGCGTTATCACTQCCATCAC^ 
TTTXSAAftGTGAGCAGAAAGGAAlCKrrC^^ 

MG WLFLKVLI.AG VSP SG' 17 

CXTCAOCftCC ATC GGC OGG CTT TOT CTA AAG GTT TTG TPG GOG GGA GTG AGT TTC TCA GGA 51 

FLYPLVDFCISGKTRGQKPN 37 

TTT CTT TAT OCT CTT GTG G?lT TTT TGC ATC AGT G3G AAA ACA AGA GGA CAG AAG CCA AAC 111 

FVlIi:.ADDMGWGDLGANWAE 57 

TTT GTG ATT ATT TTG GCC GAT GAC ATG GGG TOG GGT GAC CTG GGA GCA AAC TGG OCA GAA 171 

TKDTAKLDKMASEGMRFVDF 77 

ACA AAG GAC ACT GCC AAC CTT GAT AAG ATC OCT TCG GAG GGA ATG AGG TTT GTG GAT TTC 231 

HAAASTCSPSRASLLTGRLG 97 

CAT GCA GCT GCC TCC ACC TGC TCA CCC TCC CG6 OCT TOC TTG CTC ACC GGC COG CTT GGC 291 

LRNGVTRNFAVTSV GGLPLN IIT 

CTT CX3C AAT GGA GTC ACA CGC AAC TTT GCA GTC ACT TCT GTG GGA GGC CTT COG CTC AAC 351 

ETTLAEVLQQAGYVTGIIGK 137 

GAG ACC ACC TTG GCA GAG GTO CIG CAG CAG GOG GGT TAC GTC ACT GGG ATA ATA GGC AAA 411 

WH LGHHGSYHPNFRGFDYYF 157 

TGG CAT CTT GGA CAC CAC GGC TCT TAT CAC CCC AAC TIC OGT GGT TTT GAT TAC TAC TTT 471 

GIPYSHDMGCTDTPGYNHPP 177 

GGA ATC CCA TAT AGC CAT GAT ATG GGC TGT ACT GAT ACT CCA GGC TAC AAC CAC OCT CCT 531 

CPACPQGDGPSRNLQRDCYT 197 

TGT CCA GOG TCT CCA CAG GGT GAT GGA CCA TCA AGG AAC CTT CAA AGA GAC TGT 13^ ACT 591 

DVALPLYENLNIVEQPVNLS 217 

GAC GTG GCC CTC CCT CTT TAT GAA AAC CTC AAC ATT GTG GAG CAG CGG GTG AAC TTG AGC 651 

SZ-AQKYAEKATQFIQRASTS 237 

AGC CTT GCC CAG AAG TAT GCT GAG AAA GCA ACC CAG TTC ATC CAG OGT GCA AGC ACC AGC 711 

GRPFLLYVALAKMHVPLpVT 257 

GGG AGG CCC TTC CTG CTC TAT GTC GCT CTG GCC CAC ATG CAC GTG CCC TTA ^ 771 

QI.PAAPRGRSLYGAGLWE M D 277 
CAGCTACCAGCAGOGCCACGGGGCAGAAGCCTGTATGGTGCAGQGCTCTSG.G^ B31 

297 
891 



S LVG Q X K D KVD HT V K E N T ^ }' 
ACT CXG GTG GGC ORG ATC AAG GAC AAA GTT GAC CAC ACA GTG AAG GAA AAC ACA TTC CTC 

WFTGDMGPWAQKCELAGS ^ ^ 
TOG TTT ACA 03A GAC AAT GGC COG TOG GCT CAG AAG TGT GAG CTA GOO ^ 



317 
951 

337. 



PFTGPMQ*RQGGSPAKQT TW 337. 
COCTTCACTGGATTTTGSCAAACTCXSrCAAGGGGGAAQTCCACCCAAGCAGAOG ACC TGG lUit 

^ftOHRVPALAYWPGRVPVNV 357 
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AOC ACT GCC TTC5 TTA itfX: GTG CTG GAC ATT 

dVSEVLFG 397 
* = ^ .?.„^C^cL.I^G^TGLGLG^.C^'I«:GAGGI«CTCTrrGGC 1191 

GCX: AGC TTA CXTT CAA QGA CGG OGC TTT 

iTHPNSG^AAG EF 417 

R S Q P G H ^\^}\J^JLcc^ 1251 
CGGTCACJ^GOCTGGGCACAGGGTGCTGTTC-<^^ 

„„yKAFYITGGA 437 

^c£cc^c5o^c:^ccL<^<^c^'r---™:T.c.^-oc.oa.occ I3ii 

r.oHKFPJ^ IFN 457 

ERGGAEYQA 477 
CTGGRAQl«:<»TAOCGC»GAAGCTGTCCCCCTAGRAJW5A«3 

526 

lACRCQAA* 1578 
ATT QOC TOC OGC TOT CAI^ GCC GCA TAA 

^^^^^^^^^^^^^^ 
^^^^^^^^^^^^^ 
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Prosite Pattern Matches 

verdoiu biUcMe iU of mnttiy 1995- 
>pspopoi \ pDQCQOOQi I ASM ctiYoosTfiATifM H-olycosylatlofi slce. 



Queryi 


117 


KETT 


120 


Oucryi 


21S 


KCjSS 


210 


Query: 


356 


KVTS 


359 


Cuerys 


497 


MZSS 


SCO 


>P5O000S 1 PDCX:00005 1 PKQ.PHOSE 


Query: 


28 


SGK 


30 


Query: 


93 


TCR 


95 


Query: 


237 


SGR 


239 


Query: 


290 


TSTK 


292 


Query: 


422 


TVR 


424 



>f*spqoo6 fgpQgooQQ6jcK2_PtiDSfaO-^iTE caseln lOxiase EX phosptiorylaclcn slce. 



Query 


120 


VIAE 


123 


Query:" 


290 


TVKE 


293 


Query: 


335 


TTWE ■ 


33B 


Query: 


3e4 


SVLD 


367 


Query: 


444 


TGPE 


447 


Query: 


499 


SSAD 


502 



> PgO0O0g \ PDOGOQOOB I KyRISTYL H-BVristoylalilon fiitie. 



Query: 


12 


GVSFSG 


17 


Query: 


33 


GQKCMF 


38 


Query: 


S2 


GAKKIVE 


57 


Query: 


97. 


GUBIISV 


102 


Query: 


113 


GLFLNE 


118 


Query: 


158 


GXPVSU 


163 


Query: 


328 


OGSFMC 


333 


Query: 


see 


GVDVSB 


393 


Query: 


418 


GAUQ1V 


423 


Query; 


435 




440 



>p5Q<ioo9 1 PDOC0Q0Q9 l«gCATro« AjHidaclon site* 
9ueryt 382 .QGRR 18S 

>£S£m^|K)OC00117lsaU'ATASq^ SulCatases cioiULtuve 2. 
Quecyt 129 GYVKSZSCBXf * 130 
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Input file Ebh235S3fl.se(if OdtpM. File 23553. txans 
Seciuence length 4321 

GfiO3RAam»CTCTO0CCCTOXX^^ 
TICOC»C5W3CrTTTTCTCTW^^ 

CTGAATJMnCTCTGfiG&ATJMSAGAri^^ 

MKYSCCALVI.A 11 

CATTTTCTC«3XrrKXA»CAT«XSW^^ ATG AM TICT TCT TQC TGT GCT CXG GTT TOO GCT 33 

VLOTELI.GSI.CSTVRSPRFR 31 

<riEC«3GOCACAGAATTOCXGGSAAQCCTOlOTTOACTGIC«3ATCX:C^ 93 

0«3 ATA CAG GftK OCSA AAA W«= AIC CXX: ARC ATT AOT CTT 153 

DODVEI.GSI.QVMNKTRKIME 71 

<^r^ GftTCnX3GftGCTOQC»«X:CKSC»AGrC«K3AW=AAAACGAGAAAGACT 213 

MGGATFIHAPVTTPM C C P S R, 91 

c5TGLG^GCCACC^ATC;a.TGCX:TTTGIGi^ACAC«:An«TGCT«:C^ 273 

S S M L T G K V V H N H NVYTNNBN 111 

TOC TCC A3X3 .CTC AOC GOG ABC TAT GIG CRC AAT CftC AAT GTC TAC ACC ARC ARC GAG ARC 333 

-CSSPSWQAMHEPRTFAVYI.M131 

TCC TCT OCC CCC TXM -rea CAG GOC ATC CAT GRG OCT a» ACT TTT GOT GTA TAT CTT A^ 393 

WTGYRTAFF G K Y T. N E IL M G S Y. 

MC ACT GSC laC AGA ACA GC C TIT TTT CGA AAA TAC CTC AAT GAA TAT AAT GGC AGC TRC 453 

WREWDGLI KHSR P Y " X. ^^l 



ATCC^C^i^T^ CGAGAATGGCXXGCSATTAATCAAGAATTCTOGCTTCTATAATTAC 513 

^G^r'lSp<ii>^T<lca^CARAGRAAWCATGGATTTGAT.TATGCAAAGGACTRC^ 573 

TTHESINYFKMSKRMY E 211 

iSr^^*ICACIAACGAG«KATTAATTACTrCAAAATGTCTAAGAGAATGTATCOC 633 

05A<^4r^«Ll«^-nLcLlST0jT4ci^cS=AiAALcJTilTT^T^C 
G^cLALA«(STA<^^AirATO<55-.JcAL<SAclAA^<^c5cAic<^ sL 

<^ i^A iSc A^X CTA C^.OCX: AAA iSTcK^TS^ ATG TC^ «" 

SVBRI.YKMI. .VETG EI.BHTYI 311 

FIGURE 5A 
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TCT GTG GW3 PJ3G CTG TAT AAC ATG CTC GTG GAG ACX5 GC3C5 GAG CTG C5AG AAT ACT TAC ATC 933 

r Y T A D H G Y H I G Q F G I. V K G K S 331 

ATT TAC AGC GOC GAC CAT GC3T TAC CAT ATT GOG CAG TTT GGA CTG GTC AAG GOG AAA TOO 993 

M P Y D FDIRVPFF IRGPSVEP 3S1 

ATG CCA TAT GAC TTT GAT ATT CGT GTG CCT TTT TTT ATT CGT QGT CCA AGT GTA GAA CCA 1053 

GSIVPQIVLKIDLAPTI L D I 371 

GGA TCA ATA GTC CCA CAG ATC GTT CTC AAC ATT GAC TTG GOC OCX: AGO ATC CTG GAT ATT 1113 

_A G^ I*DT PPDVDGKSVIUKLIjDP 391 

OCT GGG CTC GAC ACA CCT CCT GAT GTG GAC GGC AAG TCV GTC CTC AAA CTT CTG GAC CCA 1173 

EKPGNRPRTNKKAKIWRDTP 411 

GAA AAG CCA GGT AAC AGO TTT CGA ACA AAC AAG AAG GOC AAA ATT TOG CGT GAT ACA TTC 1233 

LVERGKFLRKKEESSKNIQQ 431 

CTA GTG GAA AGA GOC AAA TTT CTA CGT AAG AAG GAA GAA TCC AGC AAG AAT ATC CAA CAG 1293 

SNHLPKYERVKELCQQARYQ 451 

TCA AAT CAC TTG CCC AAA TAT GAA CGG GTC AAA GAA CTA TGC CAG CAG GCC AGG TAC CAG 1353 

TACEQPGQKWQCIEDTSGKL 471 

ACA GCC TGT GAA CAA OCG GGG CAG AAG TGG CAA TGC ATT GAG GAT ACA TCT GGC AAG CTT 1413 

RIHKCKGPSDLLTVRQSTRN 491 

CGA ATT CAC AAG TGT AAA GGA CCC AGT GAC CTG CTC ACA GTC CGG CAG AGC ACG CGG AAC 1473 

LYARGFHDKDKECSCRESGY 511 

CTC TAC GCT OGC GGC TTC CAT GAC AAA GAC AAA GAG TGC AGT TGT AGG GAG TCT QGT TAC 1S33 

RASRSQRKSQRQFLRNQGTP 531 

CGT GCC AGC AGA AGC CAA AGA AAG AGT CAA CGG CAA TTC TIG AGA AAC CAG GGG ACT CCA 1593 

KYKPRFVHTRQTRSLSVEFE 551 

AAG TAC AAG OCC AGA TTT GTC CAT ACT COG CAG ACA CGT TCC TTG TGC GTC GAA TTT GAA 1653 

GEIYD INLEEEEELQVLQ PR 571 

GGT GAA ATA TAT GAC ATA AAT CTG GAA GAA GAA GAA GAA TTG CAA GTG TTG CAA CCA AGA 1713 

NIAKRHDEGHKGPRDLQASS 591 

AAC ATT GCT AAG CGT CAT GAT GAA GGC CAC AAG GGG CCA AGA GAT CTC CAG GCT TCC AGT 1773 

GGHRORHLADSSNAVGPPTT €11 

GGT GGC AAC AGG GGC AGG ATG CTG GCA GAT AGC AGC AAC GCC GTG GGC CCA CCT ACC ACT 1033 

VRVTHKCFII.PNDS IHCE RE 631 

GTC CGA GTG ACA CAC AAG TCT TTT ATT CTT CCC AAT GAC TCT ATC CAT TGT GAG AGA GAA 1893 

L Y Q S A R A W K D H K AYI D KE IE 651 

CTG TAC CAA TOG GCC AGA GOG TGG AAG GAC CAT AAG GCA TAC ATT Ga6 AAA GAG ATT GAA 1953 

A L Q D K IKNLREVR. GHLK R R K 671 

GCT CTG CAA GAT AAA ATT AAG AAT TTA AGA GAA GTG AGA GGA CAT CTG AAG AGA AGG AAG 2013 

PEECSCSKQSYYNKEKGVKK 691 

OCT GAG GAA TCT AGC TOO AGT AAA CAA AGC TAT TAC AAT AAA GAG AAA GGT GTA AAA AAG 2073 

QEKLKSHLHPFKEAAQEVDS 711 

CAA GAG AAA TTA AAG AGC CAT CTT CAC CCA TTC AAG GAG GCT GCT CAG GAA GTA GAT AGC 2133 
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KL QD F K ENKRRRK KERKE K r 
AAA CTG CAA CTT TTC AAG GAG AAC AAC GGT AGG AGG AA/3 AAG GAG AGG AAG GAG AAG AGA 



K 



E 



H 



CGG CAG AOG AAG GGG GAA GAG TOC AGC CTG CCT GGC CTC ACT TGC TTC ADG CAT GAC AAC 



H W 



F 



S 



AAC CAC TOG CAG ACA GCC OCX3 TTC TOG AAC CTG GGA TCT TTC TGT GCT TGC AOS AGT TCT 



N 



VI 



N 



H M 



AAC AAT AAC AOC TAG TOG TC3T TTG CGT ACA GTT AAT GAG AOG CAT AAT TTT CTT TTC TGT 



M 



GAG TTT GCT ACT GGC TTT TTG GAG TAT TTT GAT ATG AAT ACA GAT OCT TAT CAG CTC ACA 



H 



M .Q 



H 



M E 



AAT ACA GTG CAC AC6 GTA GAA OGA GGC ATT TTG AAT CAG CTA CAC GTA CAA CTA ATG GAG 



K 



N 



CTC AGA AGC TGT CAA GGA TAT AAG CAG TGC AAC OCA AGA CCT AAG AAT CTT GAT GTT GGA 



H 



W 



AAT AAA GAT GGA GGA AGC TAT GAC CTA CAC AGA OGA CAG TTA TGS GAT GGA TGG GAA GGT 
TAA 

TCRGOCOOGTCTCACTOCAGACATCAACTGGCAAGGOCTRGftGGA<^ 
AGACAAAACTACAGACTTAGTCTQGTQGACTGGACTAATTACTT^^ 

TCTGTCAATQGAGATGGCCTCTQCTGACTCAGATGAAGACCCAAGG^ 
CCAGCTGAGCTTCAAAOCCTGCATTTSAAOOGAOCAACATIAAGI^ 
CAGAAGTTAATCATTTGAATTCTGAACACTQGAGAAAAAOOGAAAAATGGA^ 
AlUDOGATTTCAGIXXX^GATQGCATGACAa^ 

TTTCAGTlX:»3X3U3A3X3TTCAOCA3X3^^ 

AAACT3TIAOCriAO0CZAAACACAGXAVriC'iUU*rJ3VAiLr^^^^ 
ARCATTOCAAGCTACXX'TGGSGrXTy^^ 

caaxxmiATAATTTacrAxcBgocRAt^^ 

Gg nU yiUWi U ^U^i' l TCKAC^ 

YMRAKAKGSYWRRAWKGGGSTCTOTSKKI^^ 

KXAATGAAGTT 



731 
2193 

7Sl 
2253 

771 
2313 

791 
2373 

811 
2433 

831 
2493 

851 
2553 

871 
2613 

872 
2616 
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^^nalysis of 23553 (871 aa) 
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Prosit Pattern Matches for 23SS3 

Pm<U«, verdon: Release 12.2 «r Fcbranr 1995 

>PSqDg01 |PO0C00C0H^SWjCt,YO0SYLMPI0ei H-glyoosylAtion cite. 



Ouecys 


€4 


ffKTH 


€7 


Cucry: 


111 


MCS5 


114 


Cueiy: 


131 




134 




14B 


Mcssy 


ISl 


Query: 


170 


NYTV 


173 


Query; 


197 


NE5I 


200 


Query: 


240 




243 


Query: 


C23 


NDSZ 


626 


Query: 


773 


MMTY 


776 


Query: 


783 


NETS 


786 



>£SfIiUmS|FIX>C0000SiPKCJHIOSPKO^ITE Protein kinase C phosphorylation site. 



Q«erys 


24 


TVR 


26 


Query: 


27 


SPR 


29 


.Query: 


66 


TRK 


68 


Query: 


96 


TGK 


98 


Query: 


206 


SKR 


208 


Query: 


400 


TNK 


402 


Query: 


425. 


SSK 


427 


Query: 


468 


SGK 


470 


Query: 


484 


TVR 


486 


Query: 


488 


STR 


490 


Query: 


SOS 


SCR 


507 

* 


Query: 


S16 


SQR 


518 


Query: 


520 


SQR 


522 


Query: 


S30 


*TPIC 


532 


Query: 


€11 


TVR 


613 


Queryc 


615 


THK 


617 


Query: 


635 


SAR 


€37 



»gS00Qfl6|POQC0QQQfi|CK2 PHQSPHO SITE Cftsela Icinase XX phospboryl&tien cite. 



Queryi 


107 


Time 


110 


Queryi 


268 


SVDO 


291 


Query: 


367 


TCLD 


370 


Quety: 


376 


TPPD 


379 


Query: 


4S2 


TACe 


4S5 


Query s 


SOS 


6CRC 


508 


Query* 


761 


TVMC 


784 



FIGURE 8A 
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^jSimsai^oocoOOOn'r^ifUPa<^ Tyrosine Idnase pHosphonrlatioa site. 

> PS&OQQa I PPOC00008 iHyiggnf b «-«ycifitoyl»tion cite. 

Query: 19 c^tcsr 24 

Ouery; 161 CLIKNS 166 

Ouery: 325 GLVKGK 330 

Query: S92 taSMRGR 597 

Query: 7€3 GSPCAC 768 

Query > 851 GMKOOG 8S6 

>£aift521|PDOC001l7|SUUFATASE_l Sttlfatases sifltiature 1. 

Query: 85 PMC5CPSRSSMLTG 97 
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c 
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s 

cr 
,2 

O) 

o 
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O 
<o 

m 

CO 
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Input file Fbh2S278FIrl.fieq[; Output File 25278. trans 
Sequence length 2940 

OGGGGCAGOOCXTTGOGCTTA 

GCAGATCXXSOCXXa^GCXXnxXXSGCAGCXrAGT^^ 

AGCA'im3AGCXX3Ga3S30CGGTGGTGCX3^^ 

MHTLTGFSLVSLLSF 15 

CGCGGGCXX3GCTTCGC3G ATG CAC AOC CTC ACT GGC TTC TCP CTG GTC AGC CTG CTC AGC TTC 45 

GVLSWDWAKPSFVADGPGEA 35 

GGC TAG CTG TCC TGG GPiC TGG GCC AAG OCG AGC TTC CTG GCC GAC GGG CCC GGG GAG GCT 105 

GEQPSAAPPQPPHIIFILTD 55 

GGC GAG CAG OCC TOG GCC GCT CCG CCC CAG OCT CCC CAC ATC ATC TTC ATC CTC ACG GAC 165 

DQGYHDVGYHGSDIETPTLD 75 

GAC CAA GGC TAG CAC GAC GTG GGC TAC CAT GC5T TCA GAT ATC GAG ACC COT AOG CTG GAC 225 

RLAAKGVKLENYYIQPI OTP 95 

AGG CTG GGG GCC AAG GGG GTC AAG TTG GAG AAT TAT TAC ATC CAG CCC ATC TGC ACG CCT 285 

SRSQLLTGRYQIHTGLQH S I 115 

TCG COG AGC CAG CTC CTC ACT GGC AGG TAC CAG ATC CAC ACA GGA CTC CAG CAT TCC ATC 345 

IRPQQPNCLPLDQVTIiPQKL 135 

ATC CGC CCA CAG CAG CCC AAC TGC CTG CCC CTG GAC CAG GTG ACA CTG CCA CAG AAG CTG 405 

QEAGYSTHMVGKWHLGP Y R K 155 



CAG GAG GCA GGT TAT TCC ACC CAT ATG GTG GGC AAG TGG CAC CTG GGC TTC TAC OGG AAG 465 



.ECLPTRRGFDT FLGSLTG NV 175 

GAG TGT CTG OCC ACC CGT OGG GGC TTC GAC AOC TTC CTG GGC TCG CTC ACG GGC AAT GTG 525 

DYYTYDNCDGPGVCGFDLHE 195 

GAC TAT TAC ACC TAT GAC AAC TGT GAT GGC CCA GGC GTG TGC GGC TTC GAC CTG CAC GAG 585 

GENVAWGLSGQYSTML YAQ R 215 

GGT GAG AAT GTG GCC TGG GGG CTC AGC GGC CAG TAC TCC ACT A'iG CIT 'TAC GGC CAG CGC 645 

A .SHXLASHSPQR PL F L Y V A. F, 235 

GCC AGC CAT ATC CTG GCC AGC CAC AOC CCT CAG CGT CCC CTC TTC CTC TAT GTG GCC TTC 70S 

QAVHTPLQSPREYLYRYRTM 255 

CAG GCA GTA CAC ACA OOC CTG CAG TCC OCT CGT GAG TAC CTG TAC CGC TAC CGC ACC ATG 765 

GHVARRKYAAMVTC M D E A V. R 275 

CGC AAT GTG GCC COG OGG AAG TAC GOG GCC ATG GTG' AOC TGC MG GAT GAG GCT GTG CGC 825 

NITWALKRYGFYNNSVI'IFS 295 

AAC ATC ACC TGG GCC CTC AAG CGC TAC GGT TTC TAC AAC AAC AGT GTC ATC ATC TTC TCC 885 

SDNGGQTFSGGSNW P L R G R K 315 



AST GAC AAT GGT GGC CAG ACT TTC TCG GGG GGC AGC AAC TGG CCG CTC OGA GGA CGC AAG 945 
GTYWEGGVRGLG FVHSPLIiK 335 



GGC ACT TAT TOG GAA GGT GGC GTG OGG GGC OTA GGC TTT GTC CAC AGT. OOC CTG CTC AAG ^OOS 



FIGURE 10A 
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RKQ R T S RALMH I T DW Y P.TL- V 
OCA AAG CAA 033 ACA AGG OGG GCA CTG ATG CAC ATC ACT GAC TGG TAC COG AOC OIO GTG 

GLAGGTTSAADGL DGY DVW p 
GC3T CTG GCA GOT OCT AOC ACC TCA GCA GOC GAT GGG CTA GAT GQC TAC GAC GTG TGG COG 



S 



R 



GOC ATC AGC GAG GGC COG GCC TCA OCA CGC AQG GAG ATC CTG CAC AAC ATT GAC CCA CTC 

\rtTHAQHGSI.EGGFGXWKTAV 
TAC AAC CAT GCC CfiG CAT GGC TOC CTG GAG GQC GGC TTT GQC AlC TOO AAC AOC GCC GTG 



E W 



T 



G 



CAS OCT GOC ATC OQC GTG GGT GAG TOG AAG CTG CTG ACA GGA GAC OCC GGC TAT GGC GAT 

WIP PQTLATFP GSWWNLERM 
TOG ATC CCA OCG CAG ACA CTG GOC ACC TTC COG GGT AOC TGG TOG AAC CTC GAA CGA ATG 

ASVRQAVWIiFNISADPYERE 
GCC AGT GTC GGC CAG GCC GTG TGG CTC TTC AAC ATC AGT GOT GAC OCT TAT GAA CGG GAG 

DLAGQRPt)VVRTL I.ARLAEY 
GAC CTG GCT GGC CAG CGG OCT GAT GTG GTC CGC ACC CTG CTG GCT CGC CTC GCC GAA TAT 

NRTAI PVRYPAENPRAHPDF 
AAC OQC ACA GCC ATC COG GTA CGC TAC CCA GCT GAG AAC COC CGG GCT CAT OCT GAC TTT 



N 



W 



W 



E 



R 



AAT GGG GGT GCT TGG GGG OCC TGG GCC AGT GAT GAG GAA GAG GAG GAA GAG GAA GGG AGG 

ARS FSRGRRKKKCKICKLRS 
GCT CGA AGO TTC TOC COG GGT CGT CGC AAG AAA AAA TOC AAG ATT TGC AAG CTT OGA TOC 

PFRKLMTRLMSQRI * 
TOT TTC OGT AAA CTC AAC AOC AGG CTA ATC TCC CAA CGG ATC TGA 

GTCACA!IX7XXa:ATCTACAOQGAGTTCCSVGGGT<^^ 



355 
1065 

37S 
1125 

395 
1185 

.415 
1245 

435 
1305 

455 
1365 

475 
1425 

495 
1485 

515 
1545 

535 
16t)5 

555 
1665 

570 
1710 



ATQAGCTTTTAGOCTCAGTTTOCTCATCTGTAAAAT 



rAAGGCTTOOOOCTTAACAQGGGATOOCTC 
ACATQCaC3CTAOQC5CAG 



OQ0CATXAACXaC!lXX»GCAOCAAG0GTQGTA^ 

TCOLUXX^iUCATCAGQCTCTOGAAGACTGOCCAAOGTTGT^ 
AAAAAAAAAAAAAAAAAAAAAAOGOOOG 
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A^nalysis of 25278 (569 aa) 



7764 e4324l4ieS7 4620322261 
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lOIlVOOCOCOOllASCUCUVOOSVlATnNt H-cLreosyllLCloa sice. 
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1 466 
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W4(pooc00QO4[<^p pgQSPiio SITE cAHP- Aod cGKP-dcpciKieiic protein klDAse pbosphocylacion site. 
: 3X4 BKGT 317 

U2&|l>D0C0000S|FKC„PHOSPt{O.SXTe Froteia kln&se C phosphorylation cite. 
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iUi|efXX:00006|CK2_PKOSFHQ.SZTC Casein klMse XI phosphorylation site. 
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CHpoOC000071t«<^PHOSPHO_jsite Tyrosine kinase phosphorylation, site. 
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Qfi.|n>ocooo08|iflneaSTyi. n-n^'dstoyiation site. 
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26212 seqs 



DNA sequence (nt 706-2118 coding) 

CACGCGTCCGCCCACGCGTCCGTGGAGATATTAACTTTTTTCTTTTTTTTTTTCCTTG^^^^ 

5?g^cSg§5S?gccS^^ 

WSTATCAGATACACACCGGACTTCAACATTCTATCATAAGACCTRCCC^^CCC^^CTGTTTACCTCTG^ 

S2ScSS5g??SgLactatgacaatggcatatactc^^^^^ 

ATJ^CCCCACAAAGCCTATATTTTTATATATTGCCTATCAAGCTGTTCATTCACCACTGCAAGCT^ 
CAC 



Protein sequence 

RSNPRLNGGWGPWYKEETKKKKPSKKQAEKKQKKSKKKKKKQQKAVSGSTCHSGVTCG 
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no HMh hits Sulfatase 
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Prosite Pattern Matches for. 26212 

Prosite version: Release 12.2 of February 1995 

>PS0000l|PDOC0000l|ASN_GLYCOSYLATION N-glycosylation site. 



Query: 


157 


NATL 


160 


Query: 


306 


NVTLi 


309 


Query: 


318 


NNSI 


321 


Query: 


431 


NGSW 


434 


Query : 


497 


NITA 


500 


Queary : 


527 


NKTA 


530 



>PS0C004 I PDOC00004 | CAMP_PHOSPHO_SITE cAMP- ^ and cGMP- 
dependent protein kinase phosphorylation site. 

Query: 521 RRLS 524 

Query: 562 KKPS 565 

>PS00005|PDOC00005|PKC_PHOSPHO_SITE Protein kinase C 
phosphorylation site. 



Queary: 


131 


TGK 


133 


Query: 


189 


TRR 


191 


Query: 


243 


TQR 


245 


Query: 


413 


SPR 


415 


Query: 


489 


TGK 


491 


Query: 


509 


SNR 


511 


Query: 


559 


TKK 


561 


Query: 


576 


SKK 


578 



>PS00006 |PDOC00006lCK2_PHOSPH0_SITE Casein kinase II 
phosphorylation site. 

FIGURE 18A 
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298 


SCLD 


301 


347 


TYV7E 


350 


386 


SLAE 


389 


406 


TISE 


409 



Query: 
Query: 
Query: 
Query: 

>PS00007 1PDOC00007 |TYR_PHOSPHO_SITE Tyrosine kinase 
phosphorylation site - 

Query: 163 KLKEVGY 169 

>PS000G8 IPDOC00008 IMYRISTYL N-myristoylation site. 

Query: 28 GALAGF 33 

Query: 56 GALLAQ 61 

Query: 139 GLQHSI 144 

Query: 198 GSLLGS 203 

Query: 235 GIYSTQ 240 

Query: 329 GGQPTA 334 

Query: 343 GSKGTY 348 

Query: 351 GGIRAV 356 

Query: 432 GSWAAG 437 

Query: 439 GIWNTA 444 

>PS0014 91PDOC00117 |SUIiFATASE_2 Sulfatases signature 2. 

Query: 168 GYSTHMVGKW 177 

>PS00523|PDOC00117|SULFATASE_1 Sulfatases signature 1. 
Query: 120 PICTPSRSQFITG 132 

FIGURE 18B 
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Aligninent:s of top-scoring domains: -loinp-i 5e-93 

FK4"i.4.ilaDD4«4^9dlG+ + t t n+D +A*eG+rF ♦+ 
25277 36 pin^viiliMllJMISWGDUSWWAOTKCTMILDKMASEGMRI^ 



82 



2S277 



25277 



ICtPSRAalLTGRyphr tGitortnnragvlpCcgwsleGglpldet: t ipel 

83 TCSPSRASLLTCRLGLRNGVTRNFAV TS-VGGLPLNBTTLAEV 124 

LkeaGYaTgmvGKWHLgyneessasdfahlPlgrGFdy£ygnlGGEdQWY 
L**aGY*Tg«-^QKWIIlg+* ** +P xGFdy*+g 
12S LQQAGYVTCIIGKWHLGHHGSY HPNPRGFDmfFG 



158 



plvdallpftndtytceggygfskdvallcplgalgvnevcapdlcaladyk 



25277 1S9 IPYSH-DMGCT- 



-D 169 



tagalnvphhvfEWadryagavdvgrpElavlifprpaacflypnatWS 

t+o* * V + + + a* ly *v* 

25277 170 TPGVNHPP cPACPQOIX3PSIWM}RDC:Y--TCWAl.PLYEMUIIVE 211 

QprophspltaPrpwqlladealpElemgqrdkpfflylsykhvhiprda 
qp s 1+ Q <-a*+a ♦f^+r* * ♦pf +ly**+h^h^p 

25277 212 QPVNLSSIA QiOfAEKATQFIQRASTSGRPFLLYVALAHMHVP— 253 

prolfsskdfagssrrslYglilDsveemDdgvgrvlnaLdelNGlldnTl 
1+ + a r lYg + + cmD +vg++ * +d + i-nT* 
25277 254 -.I.PVTQLPAAPMBSt.YG W3LWEWDSI,VCQIKDKVDHT--VKENTF 296 

iiFTSllDhGghlgahghlgiragQsngpErg. gKgtnlye 

FT D*G4. 4- 4- Gs gp£ g +^++^.4.4.^.+K+t* +6 

25277 297 LWFTG— DNGPWAQKCEUV GSVGPFTGf wqtrqggspAKQTT-WB 338 

gGtRvPlivrviPeGi iapgcjvsdelvslmDl f PTi IdLAGaplPgvaagv 
gG+RvP****wP G+ •»• ♦ +s s++D*f PT++4LA a«^lP 
25277 339 GGHRVPALAYWP-GBVEVMVTSTALLSVLDIFPTWALAQASbP 381 

kdrilDGvsLlplLlgaagssrhetlCyesycnegrgflpavrwgkkkah 

♦ r DGV4^4^ g+ 4-*+h !£♦+ n AT + 
25277 382 QGRRFDGVDVSEVbFGR-SQPGHRVLFHP— KSG AAGEFOAIiQT «22 

CrtpniagworvdfddwrtcltntvedEnrsgddacrhgdvdcclgkprrs 

25277 423 VRLE ryKAFYITOGAR— ACDGSTGPEUJKKF 45* 

V thhdppllydlsrDP< - • 
pi ♦♦l* D 
25277 453 PLXFNLEODT 462 
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AlignaentG o£ top-scoring docuiins: 

eal£akas«: domain 1 of 1, from 43 to 467: score 268.9, E « 6.5e-77 
*->PHillilaOOIGi9dLGcy<>nptirtpniDrLAeeGlrFtnayvt:tp 
PNi^l+l-frDXH-fr ♦♦IG*- «• -KS P na-a-vttp 

23553 43 nfllLVLTDDQD-VELSSLQ VKNKTRKIKEHOG^TFIKAFVTrP 85 

ICtPSRAalLTGRyphr^Gsi/tnnra^vlpftgwsleGgLpldeCtlpel 
♦C4PSR++ LTG+y h*-»-»-*ytnn+* ♦♦4-4. w+ ++ ♦t^+++ 

23553 86 KCCFSRSSHLTGSYVKNHNVmfNEN — CSSPSWQ AMHEPRTFAVY 129 

LkeaGYaTgncvGKWHlgyneessasdfaniPlcrrG . Fdy EyonlOGEdQW 

23553 130 LNNTGYRTAFFlGKYLNEyNGSY IPPGWReWLGLIKH 1S5 

Yplvdallp C tndtytceggyg fskdvalfcplgaleFvneveapdka lady 
♦♦f*a + c*^g ♦ +♦+♦ ♦♦♦4 dy 
23553 166 ^ SRFYN-yTVCRMG IKEKHGFDYAK DY 190 

ktagalnvphhv£ENadU:yagavdviirpflavlifprx»aacflypnatw 
♦t^++*n ♦ y+4*4- p4-***+* ♦ 
23553 191 FTDLITMES HffFKHSK RKVPHRPVMtOf 1 219 

SQDcnphspl taPrpw^L ladoalp £ lemgqrdkp £ £ lyl eykhvhiprd 
s+ ♦ph p + 4 p+ + + ♦ ♦ ++♦ ♦kh^ ♦♦♦♦ 

2 3 55 3 220 SKAAPKGPED-S APQFSKLyPHASQH-ZTPSYKYAPlOlDKHWIKQYT 264 

apml Esskdfagssrrg 1 Ygli iDsveemDdgvgrvlnaLdelKGl IdnT 
♦pinl^ ♦ ♦£♦ ♦♦r4>4- ♦ ♦♦♦♦♦Dd+v+r-t-^n L e G+l+nT 
23553 265 OFMLPXHHCpmiLQRfCRLQ TLHSVDDSVBlLYMMLVEr-GELENT 309 

liiFTSllDhGghlgahghlgiragGsngpfirggKgtnlyegGtRvPllv 
♦ii*T4 DhG4h*g4-+g4 ♦ yK*+^ y^^-fr+RvP*** 

23553 310 YZZYTA— DHGYHIGQPGLV K-GKSKF-YDPDXRVPFPX 344 

zwPeGiiapffavsdQlvslcaZ>l£prildZAGaplP^aa9vkdrilOGvs 
. ♦pg**f ♦♦v ♦♦Ol4-PTild-fJUa<f+ P ♦OG+s 
23553 345 IU3P— SVEPGSXVPQEVUnDtAPVXLDZJVGLDTP PDVDGKS 364 

LXpl Llgaags srhe 1 1 £yesycnegrg £ Ipavrvrgkkkah frtpni agw 
+14-1L4- ■♦■ ♦+ ♦+£ ♦ ♦ +■•-♦ + ♦• ♦£ 
23553 385 VUCLLDPE KPGNRFRT-NKKAK ZWEIDTFLVERGKF 418 

C|rvd£ddvwkl£ntved£nr89ddacrh9dvckclgkprrsvthhdppll 
♦ k ♦ ♦ + + ♦ ♦ *c ++++ ♦+ ♦♦+p ♦ 

23553 419 LIUaCBESSKHXQQSHHLPKY&RyKE&CQQARy<}1A-CEQPGQK 460 

♦D 

23553 461 WC2CXEDT 467 



FIGURE 20 



wo 01/55411 



27/44 



PCT/ljS01/()3266 



o 
c 

e> 

1 ^> 

£5 S 

- i 

« 5 

CM g) 
0) Q. 

(0 











1. 1 fe;.'-"-. ' 













6WV 



OQt'HION 



9ZmiON 



S020100 



62iH 



039AAS 



OSt'AAS 



get' VOW 



CNJ 

LU 

a: 

O 



BNSDOCID: <WO 015541 1A2_L> 



wo 01/55411 



28/44 



PCT/lJSOl/03266 




3NSDCX;iD: <WO_0155411A2J_> 



wo 01/55411 



29/44 



PCT/USOl/03266 



(A 

E 

& 
c 
o 
*o 
o 

£ 



CL 
CO 

m 
in 

CO 




CO 
CM 

LU 

a: 

CD 



NUO|oo 



BNSOOCIO: <WO 01S541 1A2J_> 



wo 01/55411 



30/44 



PCT/USOl/03266 




wo 01/55411 



31/44 



PCT/US01/(>326r> 



0) 



c 

E 

i 

O 

s 
c 

s 

o 



o 

CL 
X 
0) 
CO 

m 

m 

€0 
CM 




BNSDOCID: <WO 015541 1A2J_> 



wo 01/5541 1 PCT/USOl/03266 

32/44 



0) 
C 
C 



& Hi 

o I 

CO c 
O 

(0 




CD 
CM 

LLI 

a: 
=) 

CD 



3NSDOCID; <WO 015541 1A2J_> 



wo 01/5541 1 PCT/lJSt)l/()3266 

33/44 



Alignments of top-scoring domains; 

SulCat;ases doooln 1 ot 1, from 47 to 471: score 289.7, E - 3.6e-83 
*->PMillilaDDlGigdlCeyCnptirtpniDrIAefiGlxrFtiuiyvttp 
P4-i-l-+ll4>DI>K3+ d+G 4G + l-i-tpi"t-DrXA-» Q*** n ♦P 
47 PHIIFILTDDQGVHDVGyUG-SDIETPTLDRLAAKGVKLEN-YyZOP 91 



25278 



25278 



2S278 135 



25278 



25278 



25278 



25278 



25278 



25278 



2S278 



25278 



25278 



IC t PSRAal LTGRyphr tGmy tzm rag-vlp £ tgws leGglpIde 1 1 Ipe 1 
4CtPSR*+lL.TGRy+++tG+** ♦ p+++ -l-lpLd +tlp+ 

92 ICTPSRSQLLTGRYQIHTGLQHSIIR PQQPN CLPUDQVTLPQK 134 

LkcaGYaTgmvGKWHlgyneessasd£ablPlgrGFdy£ygaIGGEdQWY 
Ii*-eAGV T+xavGKWHlg lP+*rGFd+f+g*- 
LQCAiGYSnnCVOKHKLGFrRKBC LPTRRGPOTFtCS 170 



plvdallpftndtytceggygCskdvalkplgalgvneveopdlcaladyk 
1 + d+yt-t-*-* 

171 LTGNVDTOYDM CD 184 

tagalnvphhvfESteidryagav<ivorpflavXifprpaacfLypnatws 

185 GPGVCG : FD LHEGENVAWG 202 

QpttphspltaPrpwqlladealpf Icsmgqrdkpf £lylsy)chvlilprd . 
••-'••S'*-*- ♦a+^a I ♦p fly^+^+'+vh+p 

203 l*SGQYST«L» yAQRASHILASH-SPQRPLFLYVAFQAVHTPLQs 244 

apral f s sfcdf agssrrg lYgl ilDsveemDdgvgrvlxiaLdelHGlldnT 
+ +4.+ 4-+ g+ E-+ Y+ ++V mD4^+v ++ aL*-*- G **n 
245 PREYLYRYRTMGNVARRKYA AMyTCMDEAVRiaTVIAlJUlY-GFYNNS 290 

liiFTSllDhGghlgahghlgiragGsngpfrggKgtnlyegGtRvPliv 
♦liF+S D+Og** gGsn*p+rg+Kgt ♦egG+R ♦♦v 

291 VXZFSS— tXIGGQTF S GGSHWPLRGRKGTY-WBG6VRGLGFV 329 

rwPeGl iapg<ivsdelvsljnDl fPTildtAGaplPgvaagvlcdrilDeSvs 
♦♦P ♦4-S+4-1 ♦* D**PT*"«- l#AO+* ♦ IDG** 

330 HSP-LlJaiKQRTSRALMHITDWYPTl.vai*AGGTTS -AADGLDGYD 372 

laplLlgaagssrhetLfye Bycnegrgflpavrwgkkkalif rt 

373 VWPAXSBGRftSPRTErumidplynh»jQHGSLEG-----GPCn^ 417 

pni . agMqrvdfddvwklfntvedEnrsgddaerligdvckoIglEprrs vt 
♦ + w ♦ ♦ ♦♦d* a ♦ g ♦ ♦ 
418 AZRvGEWK LLTGDPCVGDWIPPQTIATFPGSWWNLER MAS 457 

hhdppllydlsrDP<- • 
♦ 1+++S+DP 
458 VRQAVWLFKtSADP 471 
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Colon N 
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Colon T 
Colon T 
Colon T 



DCIS 
Normal 
IDC 
DC 
ILC 
IDC 
IDODCIS 
Nonnal 
Normal 
Normal 
Normal 
AC 

sec 

AC 
AC 
AC 

sec 

SCO 
Nomial 
Normal 
Nonmal 
Nonmal 
Adeno 
Adeno 
Adeno 



Relative 
Expression 
47.84 

52.89 

44.79 

29.55 

43.26 

60.13 

20.11 

36.00 

26.54 

31.45 

17.57 

31.45 

35.02 

27.19 

3.89 

5.74 

47.18 

42.37 

2.37 

16.34 
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CHT528 
CHT386 
CKT372 
CHT532 
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CHT321 
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CHT 339 



Colon T 
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Liver Met 
Liver Met 
Liver Met 
Liver Met 
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Liver N 
UverN 
Uver 
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Adeno 
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Met 
Met 
Met 
Normal 
Normal 
Normal 
Normal 



Relative 
Expression 

11.63 

372^ 

2.39 

4.45 

23.43 

11.35 

30.38 

46.21 

7.31 

9.30 

1.77 

1.58 



PIT 265 Breast N 

MDA 335 Breast N 

NDR 132 Breast T 

NDR 13 Breast N 

NDR 56 Breast N 



Nonmal 37.40 

Normal 45.57 

DCIS 10.56 
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Normal 20.61 
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iments of top-scoring domains: 

ttaee: domain 1 of 1, from is to 502: score 324.5, E = 1.3e-93 

*->PNvllilaDDlGigdlgcyghptirTPnldrIxA.eeGlrFtnhytatp 

P+ ++ilaDD+G4- d+g++g ++i TP+ld+liA+eG+++ n-i-y+ +p 
26212 76 pHLIPILADDQGPRDVGYHG-SEIKTPTLDKIAAEGVKLEirzyV-QP 120 

iCsPSRAaLlTGryphrhGmvsngrlgvlgftaksgglpldettLpelLk 
+C+PSR+++ TG+y+-i-++G + + + ++ +lpld +tljp+ Lk 

121 ICTPSRSQFITGKYQrHTGIiQH SIIRPTQPNCIiPLDNATLPQKLK 165 

eaCYaTglvGKWHlglnensdaagdgehlPlgv/rGfdyfdgflygspfty 
e GY T++vGKWHlg4-++ +e+ P++ rGfd f +g 1+gs +4-y 

166 EVGYSTHMVGKWHLGFYR KECMPTR-RGFDTPFGSLl.aSGDyY 207 

deencdngegteppeaypeqgwlpqilgyyltdlladkalglldvasaag 
++ cd +P+ ++++1+ aa 

208 THYKCD SPGM CGYDLYENDNAA- 229 

r 1 lakalaas rPFf lyi sppaphf s i If rnf kevaqpyrapql tql f vde 

++++ + ++tq++-»-++ 

230 WDYD NGIYSTQMYTQR 245 

aadf iernk . ekPf f lylaf Irlhvhtplf spaedleskdf IgrsqrgrY 
kP fly a++ +vh pl++p + e+++ r+rY 
246 VQQIIiASHNpTKPIFIiYIAYQ--AVHSPIiQAPGRYFEHYRSIININRRRy 293 

gdlveemDdlvGrvldaLedlGlldNTlvifTSDnGahlegtpewygggn 
^.++^.+ alj+ G ++N ++i++SDnG g+P+ +gg+n 
294 AAMIiSCIiDEAINNVTLAIiKTYGFYNNSIIIYSSDNG GQPT-AGGSN 33 6 

gplkggKgygslyeGgiRvPllvrwPggiapagrvkekselvshvDlaPT 
+pl+g Kg+ +eGgiR ++V++P + +g+v + elv-»-+ D++PT 
339 WPLRGSKGTY--WEGGIRAVGFVHSP-LLKiIKGTVCK--EriVHrTDWYPT 383 

ildlAGaplPkvanGakdrplDGvsllplllggaapsrrahetlfhyngk 
+ +1A + ++ d 1DG++-I-+ + +g + s-J- + '♦•■i-+h+ 
384 LISLAEGQIDE DIQLDGYDIWETISEGLR-SP--RVDII1HN 421 

grklravrwprksgktpklkahf f tpaf 

++ ++ +k:+ + + a + ++ ++ + ++ + +++++ ++ 
422 '"-IDPIYTKAKN GSWAAGYGIWNTaiqsairvqhwklltgnpgysd 465 

dddtnngwecvgtvsqaddiedcr cegve tvthhdppe lyDl s rDP 

+ ++ e t+ + +l-t-+ ++DP 



26212 



26212 



26212 



26212 



26212 



26212 



26212 



26212 



26212 



++++ n+g 
26212 466 wvppQSFSNLG- 



-PHRWHNER-ITLSTGKSVWIjFNITADP 502 



26212 



FIGURE 29 
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Expression of 26212 in proliferating and arresting EC 
HUVEC HMVEC-Cardiac HMVEC-Lung 

500 1 




FIGURE 30 
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26212.1 Expression in Oncology Plate i 

3 1 — ' 




FIGURE 31A 
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26212.1 Expression in Oncology Plate II 
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26212.1 Expression in Clinical Breast Samples 
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FIGURE 32 
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26212.1 Expression in Clinical Lung Samples 
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FIGURE 34 
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26212.2 Expression In the Angiogenesis Panel 




FIGURE 35 
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SEQUENCE LISTING 

<110> Glucksman, Maria Alexandra 
Williamson, Mark 
Tsia, Fong-Ying 
Rudolph-Owen, Laura A. 

<120> 22438, 23553, 25278, and 26212 Novel 
Human Sulfatases (A CIP Application) 



<130> 35800/208709 

<151> 

<160> 14 

<170> FastSEQ for Windows Version 4.0 

<210> 1 
<211> 525 
<212> PRT 

<213> homo sapiens . 

<400> 1 

Met Gly Trp Leu Phe Leu Lys Val Leu Leu Ala Gly Val Ser Phe Ser 

15 10 15 

Gly Phe Leu Tyr Pro Leu Val Asp Phe Cys lie Ser Gly Lys Thr Arg 

20 25 30 

Gly Gin Lys Pro Asn Phe Val lie lie Leu Ala Asp Asp Met Gly Trp 

35 40 45 

Gly Asp Leu Gly Ala Asn Trp Ala Glu Thr Lys Asp Thr Ala Asn Leu 

50 55 60 

Asp Lys Met Ala Ser Glu Gly Met Arg Phe Val Asp Phe His Ala Ala 
65 70 75 80 

Ala Ser Thr Cys Ser Pro Ser Arg Ala Ser Leu Leu Thr Gly Arg Leu 

85 90 95 

Gly Leu Arg Asn Gly Val Thr Arg Asn Phe Ala Val Thr Ser Val Gly 

100 105 110 

Gly Leu Pro Leu Asn Glu Thr Thr Leu Ala Glu Val Leu Gin Gin Ala 

115 120 125 

Gly Tyr Val Thr Gly lie lie Gly Lys Trp His Leu Gly His His Gly 

130 135 140 

Ser Tyr His Pro Asn Phe Arg Gly Phe Asp Tyr Tyr Phe Gly lie Pro 
145 150 155 160 

Tyr Ser His Asp Met Gly Cys Thr Asp Thr Pro Gly Tyr Asn His Pro 

165 170 175 

Pro Cys Pro Ala Cys Pro Gin Gly Asp Gly Pro Ser Arg Asn Leu Gin 

180 185 190 

Arg Asp Cys Tyr Thr Asp Val Ala Leu Pro Leu Tyr Glu Asn Leu Asn 

195 200 205 

lie Val Glu Gin Pro Val Asn Leu Ser Ser Leu Ala Gin Lys Tyr Ala 

210 215 220 

Glu Lys Ala Thr Gin Phe lie Gin Arg Ala Ser Thr Ser Gly Arg Pro 
225 230 235 240 

Phe Leu Leu Tyr Val Ala Leu Ala His Met His Val Pro Leu Pro Val 

245 250 255 

Thr Gin Leu Pro Ala Ala Pro Arg Gly Arg Ser Leu Tyr Gly Ala Gly 

260 265 270 

Leu Trp Glu Met Asp Ser Leu Val Gly Gin lie Lys Asp Lys Val Asp 

275 280 285 

His Thr Val Lys Glu Asn Thr Phe Leu Trp Phe Thr Gly Asp Asn Gly 

290 295 300 

Pro Trp Ala Gin Lys Cys Glu Leu Ala Gly Ser Val Gly Pro Phe Thr 
305 310 315 320 

Gly Phe Trp Gin Thr Arg Gin Gly Gly Ser Pro Ala Lys Gin Thr Thr 

1 
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325 330 335 

Trp Glu Gly Gly His Arg Val Pro Ala Leu Ala Tyr Trp Pro Gly Arg 

340 345 350 

Val Pro Val Asn Val Thr Ser Thr Ala Leu Leu Ser Val Leu Asp He 

355 360 365 

Phe Pro Thr Val Val Ala Leu Ala Gin Ala Ser Leu Pro Gin Gly Arg 

370 375 380 

Arg Phe Asp Gly Val Asp Val Ser Glu Val Leu Phe Gly Arg Ser Gin 
385 390 395 400 

Pro Gly His Arg Val Leu Phe His Pro Asn Ser Gly Ala Ala Gly Glu 

405 410 415 

Phe Gly Ala Leu Gin Thr Val Arg Leu Glu Arg Tyr Lys Ala Phe Tyr 

420 425 430 

He Thr Gly Gly Ala Arg Ala Cys Asp Gly Ser Thr Gly Pro Glu Leu 

435 440 445 

Gin His Lys Phe Pro Leu He Phe Asn Leu Glu Asp Asp Thr Ala Glu 

450 455 460 

Ala Val Pro Leu Glu Arg Gly Gly Ala Glu Tyr Gin Ala Val Leu Pro 
465 470 475 480 

Glu Val Arg Lys Val Leu Ala Asp Val Leu Gin Asp He Ala Asn Asp 

485 490 495 

Asn He Ser Ser Ala Asp Tyr Thr Gin Asp Pro Ser Val Thr Pro Cys 

500- 505 510 

Cys Asn Pro Tyr Gin He Ala Cys Arg Cys Gin Ala Ala 
515 520 525 

<210> 2 
<211> 2175 
<212> DNA 
<213> homo sapiens 

<220> 
<221> CDS 

<222> (248) . . . (1825) 
<400> 2 

cacgcgtccg caaatttcct gattcttttg aattaggatt ccagatgggg gcctcatttc 



ttc tea gga ttt ctt tat cct ctt gtg gat ttt tgc ate agt ggg aaa 
Phe Ser Gly Phe Leu Tyr Pro Leu Val Asp Phe Cys He Ser Gly Lys 
15 20 25 30 



gca get gee tec aec tge tea ccc tec egg get tec ttg etc ace ggc 
Ala Ala Ala Ser Thr Cys Ser Pro Ser Arg Ala Ser Leu Leu Thr Gly 
80 85 90 



60 



tacagccccc aaeattccta tagcegttat caetgccatc accactgcca ccagcatctt 120 



180 



cttgeagatt ceacecetge teceeagaga cttcetgett tgaaagtgag eagaaaggaa 
gctctcagaa aaatetctag tggtggctgc cgtegeteea gacaatcgga atcctgcctt 24 0 
eaccaee atg ggc tgg ctt ttt eta aag gtt ttg ttg geg gga gtg agt 289 
Met Gly Trp Leu Phe Leu Lys Val Leu Leu Ala Gly Val Ser 
15 10 



337 



aca aga gga cag aag cca aac ttt gtg att att ttg gee gat gac atg 385 
Thr Arg Gly Gin Lys Pro Asn Phe Val He He Leu Ala Asp Asp Met 
35 40 45 

ggg tgg ggt gac etg gga gca aac tgg gca gaa aca aag gac act gcc 433 
Gly Trp Gly Asp Leu Gly Ala Asn Trp Ala Glu Thr Lys Asp Thr Ala 
50 55 60 

aac ctt gat aag atg get teg gag gga atg agg ttt gtg gat ttc eat 481 
Asn Leu Asp Lys Met Ala Ser Glu Gly Met Arg Phe Val Asp Phe His 
65 70 75 



529 



egg ctt ggc ctt cgc aat gga gtc aca cgc aac ttt gca gte act tct 577 
Arg Leu Gly Leu Arg Asn Gly Val Thr Arg Asn Phe Ala Val Thr Ser 
95 100 105 110 

2 
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gtg gga ggc ctt ccg etc aac gag acc acc ttg gca gag gtg ctg cag 625 
Val Gly Gly Leu Pro Leu Asn Glu Ttir Thr Leu Ala Glu Val Leu Gin 
115 120 125 

cag gcg ggt tac gtc act ggg ata ata ggc aaa tgg cat ctt gga cac 673 
Gin Ala Gly Tyr Val Thr Gly He He Gly Lys Trp His Leu Gly His 
130 135 140 

cac ggc tct tat cac ccc aac ttc cgt ggt ttt gat tac tac ttt gga 721 
His Gly Ser Tyr His Pro Asn Phe Arg Gly Phe Asp Tyr Tyr Phe Gly 
145 150 155 

ate cca tat age cat gat atg ggc tgt act gat act cca ggc tac aac 769 
He Pro Tyr Ser His Asp Met Gly Cys Thr Asp Thr Pro Gly Tyr Asn 

16.0 165 170 

cac cct cct tgt cca gcg tgt cca cag ggt gat gga cca tea agg aac 817 
His Pro Pro Cys Pro Ala Cys Pro Gin Gly Asp Gly Pro Ser Arg Asn 
175 180 185 190 

ctt caa aga gac tgt tac act gac gtg gcc etc cct ctt tat gaa aac 865 
Leu Gin Arg Asp Cys Tyr Thr Asp Val Ala Leu Pro Leu Tyr Glu Asn 
195 200 205 

etc aac att gtg gag cag ccg gtg aac ttg age age ctt gcc cag aag 913 
Leu Asn He Val Glu Gin Pro Val Asn Leu Ser Ser Leu Ala Gin Lys 
210 215 220 

tat get gag aaa gca ace cag ttc ate cag cgt gca age ace age ggg 961 
Tyr Ala Glu Lys Ala Thr Gin Phe He Gin Arg Ala Ser Thr Ser Gly 
225 230 235 

agg ccc ttc ctg etc tat gtg get ctg gee cac atg cac gtg ccc tta 1009 
Arg Pro Phe Leu Leu Tyr Val Ala Leu Ala His Met His Val Pro Leu 
240 245 250 

ccc gtg act cag eta cca gca gcg cca egg ggc aga age ctg tat ggt 1057 
Pro Val Thr Gin Leu Pro Ala Ala Pro Arg Gly Arg Ser Leu Tyr Gly 
255 260 265 270 

gea ggg etc tgg gag atg gac agt ctg gtg ggc cag ate aag gac aaa 1105 
Ala Gly Leu Trp Glu Met Asp Ser Leu Val Gly Gin He Lys Asp Lys 
275 280 285 

gtt gac cac aca gtg aag gaa aac aca ttc etc tgg ttt aca gga gac 1153 
Val Asp His Thr Val Lys Glu Asn Thr Phe Leu Trp Phe Thr Gly Asp 
290 295 300 

aat ggc ccg tgg get cag aag tgt gag eta gcg ggc agt gtg ggt ccc 1201 
Asn Gly Pro Trp Ala Gin Lys Cys Glu Leu Ala Gly Ser Val Gly Pro 
305 310 315 

ttc act gga ttt tgg caa act cgt caa ggg gga agt cca gcc aag cag 1249 
Phe Thr Gly Phe Trp Gin Thr Arg Gin Gly Gly Ser Pro Ala Lys Gin 
320 325 330 

aeg ace tgg gaa gga ggg cac egg gtc cca gea ctg get tac tgg cct 1297 
Thr Thr Trp Glu Gly Gly His Arg Val Pro Ala Leu Ala Tyr Trp Pro 
335 340 345 350 

ggc aga gtt cca gtt aat gtc acc age act gee ttg tta age gtg ctg 1345 
Gly Arg Val Pro Val Asn Val Thr Ser Thr Ala Leu Leu Ser Val Leu 
355 360 365 

gac att ttt cca act gtg gta gcc ctg gcc cag gcc age tta cct caa 1393 
Asp He Phe Pro Thr Val Val Ala Leu Ala Gin Ala Ser Leu Pro Gin 

3 
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370 



375 



380 



gga egg cgc ttt gat ggt gtg gac gtc tec gag gtg etc ttt ggc egg 
Gly Arg Arg Phe Asp Gly Val Asp Val Ser Glu Val Leu Phe Gly Arg 
385 390 395 

tea cag cct ggg cac agg gtg etg tte cae cce aac age ggg gca get 
Ser Gin Pro Gly His Arg Val Leu Phe His Pro Asn Ser Gly Ala Ala 
400 405 410 

gga gag ttt gga gee ctg cag act gtc cgc ctg gag cgt tac aag gee 
Gly Glu Phe Gly Ala Leu Gin Thr Val Arg Leu Glu Arg Tyr Lys Ala 
415 420 425 430 

tte tac att aec ggt gga gee agg geg tgt gat ggg age acg ggg cct 
Phe Tyr lie Thr Gly Gly Ala Arg Ala Cys Asp Gly Ser Thr Gly Pro 
435 440 445 



1441 



1489 



1537 



1585 



gag ctg cag cat aag ttt cct etg att tte aac ctg gaa gac gat ace 
Glu Leu Gin His Lys Phe Pro Leu lie Phe Asn Leu Glu Asp Asp Thr 
450 455 460 

gca gaa get gtg cce eta gaa aga ggt ggt gcg gag tac cag get gtg 
Ala Glu Ala Val Pro Leu Glu Arg Gly Gly Ala Glu Tyr Gin Ala Val 
465 470 475 

Ctg cce gag gtc aga aag gtt ctt gca gac gtc etc caa gac att gee 
Leu Pro Glu Val Arg Lys Val Leu Ala Asp Val Leu Gin Asp He Ala 
480 485 490 

aac gac aac ate tee age gca gat tac act cag gac cct tea gta act 
Asn Asp Asn He Ser Ser Ala Asp Tyr Thr Gin Asp Pro Ser Val Thr 
495 500 505 510 

cce tge tgt aat cce tac caa att gee tge cgc tgt caa gee gca taa 
Pro Cys Cys Asn Pro Tyr Gin He Ala Cys Arg Cys Gin Ala Ala * . 

515 520 525 



1633 



1681 



1729 



1777 



1825 



cagaccaatt 
ttteattttt 
agttagcctt 
tgagetgcge 
tcaggcacag 
gttgattttg 



tttattccac 
accctcttta 
geatatccet 
tggctctggg 
gtgeeagetc 
agggttaaat 



gaggaggagt 
caaacacacg 

tctgtatcct 
eagggagtgt 
cagcttttga 
aaaggcatae 



acctggaaat 
ctttagttta 
gtccetccte 
gccttaatgg 
aettgggcaa 
atgaaaaaaa 



taggcaagtt 
gtcttggagt 
caegccgacc 
gaagcacacg 
ttgtttaacc 
aaaaaaaaaa 



tgettccaaa 
ttagttttgg 
cgagagcage 
ggctttggag 
taacctgcaa 



1885 
1945 
2005 
2065 
2125 
2175 



<210> 3 

<211> 871 

<212> PRT 

<213> homo sapiens 



<400> 3 
Met Lys Tyr 
1 

Leu Leu Gly 

Arg He Gin 
35 

Leu Thr Asp 
50 

Lys Thr Arg 
65 

Phe Val Thr 

Gly Lys Tyr 

Ser Ser Pro 
115 



Ser Cys 
5 

Ser Leu 
20 

Gin Glu 

Asp Gin 

Lys He 

Thr Pro 

85 
Val His 
100 

Ser Trp 



Cys Ala Leu Val 
Cys Ser 
Arg Lys 



Asp Val 

55 
Met Glu 
70 

Met Cys 
Asn His 
Gin Ala 



Thr Val 

25 
Asn He 
40 

Glu Leu 



His Gly 

Cys Pro 

Asn Val 
105 
Met His 
120 



Leu Ala 
10 

Arg Ser 

Arg Pro 

Gly Ser 

Gly Ala 

75 
Ser Arg 
90 

Tyr Thr 
Glu Pro 

4 



Val Leu Gly 

Pro Arg Phe 
30 

Asn He He 
45 

Leu Gin Val 
60 

Thr Phe lie 

Ser Ser Met 

Asn Asn Glu 
110 

Arg Thr Phe 
125 



Thr 

15 

Arg 

Leu 

Met 

Asn 

Leu 

95 

Asn 

Ala 



Glu 

Gly 

Val 

Asn 

Ala 

80 

Thr 

Cys 

Val 
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Tyr Leu Asn Asn Thr Gly Tyr Arg Thr Ala Phe Phe Gly Lys Tyr Leu 

130 135 140 

Asn Glu Tyr Asn Gly Ser Tyr lie Pro Pro Gly Trp Arg Glu Trp Leu 
145 150 155 160 

Gly Leu lie Lys Asn Ser Arg Phe Tyr Asn Tyr Thr Val Cys Arg Asn 

165 170 175 

Gly lie Lys Glu Lys His Gly Phe Asp Tyr Ala Lys Asp Tyr Phe Thr 

180 185 190 

Asp Leu lie Thr Asn Glu Ser lie Asn Tyr Phe Lys Met Ser Lys Arg 

195 200 205 

Met Tyr Pro His Arg Pro Val Met Met Val lie Ser His Ala Ala Pro 

210 215 220 

His Gly Pro Glu Asp Ser Ala Pro Gin Phe Ser Lys Leu Tyr Pro Asn 
225 230 235 240 

Ala Ser Gin His lie Thr Pro Ser Tyr Asn Tyr Ala Pro Asn Met Asp 

245 250 255 

Lys His Trp lie Met Gin Tyr Thr Gly Pro Met Leu Pro lie His Met 

260 265 270 

Glu Phe Thr Asn lie Leu Gin Arg Lys Arg Leu Gin Thr Leu Met Ser 

275 280 285 

Val Asp Asp Ser Val Glu Arg Leu Tyr Asn Met Leu Val Glu Thr Gly 

290 295 300 

Glu Leu Glu Asn Thr Tyr lie lie Tyr Thr Ala Asp His Gly Tyr His 
305 310 315 320 

lie Gly Gin Phe Gly Leu Val Lys Gly Lys Ser Met Pro Tyr Asp Phe 

325 330 335 

Asp lie Arg Val Pro Phe Phe lie Arg Gly Pro Ser Val Glu Pro Gly 

340 345 350 

Ser lie Val Pro Gin lie Val Leu Asn lie Asp Leu Ala Pro Thr lie 

355 360 365 

Leu Asp lie Ala Gly Leu Asp Thr Pro Pro Asp Val Asp Gly Lys Ser 

370 375 380 

Val Leu Lys Leu Leu Asp Pro Glu Lys Pro Gly Asn Arg Phe Arg Thr 
385 390 395 400 

Asn Lys Lys Ala Lys lie Trp Arg Asp Thr Phe Leu Val Glu Arg Gly 

405 410 415 

Lys Phe Leu Arg Lys Lys Glu Glu Ser Ser Lys Asn lie Gin Gin Ser 

420 425 430 

T^n His Leu Pro Lys Tyr Glu Arg Val Lys Glu Leu Cys Gin Gin Ala 

435 440 445 

Arg Tyr Gin Thr Ala Cys Glu Gin Pro Gly Gin Lys Tarp Gin Cys lie 

450 455 460 

Glu Asp Thr Ser Gly Lys Leu Arg lie His Lys Cys Lys Gly Pro Ser 
465 470 475 480 

Asp Leu Leu Thr Val Arg Gin Ser Thr Arg Asn Leu Tyr Ala Arg Gly 

485 490 495 

Phe His Asp Lys Asp Lys Glu Cys Ser Cys Arg Glu Ser Gly Tyr Arg 

500 505 510 

Ala Ser Arg Ser Gin Arg Lys Ser Gin Arg Gin Phe Leu Arg Asn Gin 

515 520 525 

Gly Thr Pro Lys Tyr Lys Pro Arg Phe Val His Thr Arg Gin Thr Arg 

530 535 540 

Ser Leu Ser Val Glu Phe Glu Gly Glu lie Tyr Asp lie Asn Leu Glu 
545 550 555 • 560 

Glu Glu Glu Glu Leu Gin Val Leu Gin Pro Arg Asn lie Ala Lys Arg 

565 570 575 

His Asp Glu Gly His Lys Gly Pro Arg Asp Leu Gin Ala Ser Ser Gly 

580 585 590 

Gly Asn Arg Gly Arg Met Leu Ala Asp Ser Ser Asn Ala Val Gly Pro 

595 600 605 

Pro Thr Thr Val Arg Val Thr His Lys Cys Phe lie Leu Pro Asn Asp 

610 615 620 

Ser lie His Cys Glu Arg Glu Leu Tyr Gin Ser Ala Arg Ala Trp Lys 
625 630 635 640 

Asp His Lys Ala Tyr lie Asp Lys Glu lie Glu Ala Leu Gin Asp Lys 

645 650 655 

lie Lys Asn Leu Arg Glu Val Arg Gly His Leu Lys Arg Arg Lys Pro 

5 
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660 665 670 

Glu Glu Cys Ser Cys Ser Lys Gin Ser Tyr Tyr Asn Lys Glu Lys Gly 

675 680 685 

Val Lys Lys Gin Glu Lys Leu Lys Ser His Leu His Pro Phe Lys Glu 

690 695 700 

Ala Ala Gin Glu Val Asp Ser Lys Leu Gin Leu Phe Lys Glu Asn Asn 
705 710 715 720 

Arg Arg Arg Lys Lys Glu Arg Lys Glu Lys Arg Arg Gin Arg Lys Gly 

725 730 735 

Glu Glu Cys Ser Leu Pro Gly Leu Thr Cys Phe Thr His Asp Asn TVsn 

740 745 750 

His Trp Gin Thr Ala Pro Phe Trp Asn Leu Gly Ser Phe Cys Ala Cys 

755 760 765 

Thr Ser Ser Asn Asn Asn Thr Tyr Trp Cys Leu Arg Thr Val Asn Glu 

770 775 780 

Thr His Asn Phe Leu Phe Cys Glu Phe Ala Thr Gly Phe Leu Glu Tyr 
785 790 795 800 

Phe Asp Met Asn Thr Asp Pro Tyr Gin Leu Thr Asn Thr Val His Thr 

805 810 815 

Val Glu Arg Gly He Leu Asn Gin Leu His Val Gin Leu Met Glu Leu 

820 825 830 

Arg Ser Cys Gin Gly Tyr Lys Gin Cys Asn Pro Arg Pro Lys Asn Leu 

835 840 845 

Asp Val Gly Asn Lys Asp Gly Gly Ser Tyr Asp Leu His Arg Gly Gin 

850 855 860 

Leu Trp Asp Gly Trp Glu Gly 
865 870 

<210> 4 

<211> 4321 

<212> DNA 

<213> homo sapiens 



<220> 
<221> CDS 
<222> (510) 

<400> 4 

cccacgcgtc 

actaggaaac 

ttcccggctg 
cggaggagga 
ccagagcttt 
atcatctaaa 
cagaaaatct 
gattattcaa 
gtcagttttg 



. (3125) 



cggctaatga 
cccaggcgca 

ccggcgctcc 
aggaagtccc 
ttctctagag 
gaagataaac 
tcaaaggacc 
ccaggatacc 
caacattgga 



atcttggggc cggtgtcggg ccggggcggc ttgatcggca 60 

gaggccagga gcgagggcag cgaggatcag aggccaggcc 120 

tcggaggtca gggcagatga ggaacatgac tctccccctt 180 

gctgccacct tatctctgct cctctgcctc ctccctgttc 240 

aagattttga aggcggcttt tgtgctgacg gccacccacc 300 

ttggcaaatg acatgcaggt tcttcaaggc agaataattg 360 

ctatctgcag atgttctgaa tacctctgag aatagagatt 420 

taattcaaga actccagaaa tcaggagacg gagacatttt 480 

ccaaataca atg aag tat tct tgc tgt get ctg 533 
Met Lys Tyr Ser Cys Cys Ala Leu 
1 5 



gtt ttg get gtc ctg ggc aca gaa ttg ctg gga age etc tgt teg act 581 
Val Leu Ala Val Leu Gly Thr Glu Leu Leu Gly Ser Leu Cys Ser Thr 
10 15 20 

gtc aga tec ccg agg ttc aga gga egg ata cag cag gaa cga aaa aac 629 
Val Arg Ser Pro Arg Phe Arg Gly Arg He Gin Gin Glu Arg Lys Asn 
25 30 35 40 

ate cga cec aac att att ctt gtg ett ace gat gat caa gat gtg gag 677 
He Arg Pro Asn He He Leu Val Leu Thr Asp Asp Gin Asp Val Glu 
45 50 55 

ctg ggg tec ctg caa gtc atg aac aaa aeg aga aag att atg gaa cat 725 
Leu Gly Ser Leu Gin Val Met Asn Lys Thr Arg Lys He Met Glu His 
60 65 70 

ggg ggg gcc acc ttc ate aat gee ttt gtg act aca cec atg tgc tgc 773 

6 
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Gly Gly Ala Thr Phe lie Asn Ala Phe Val Thr Thr Pro Met Cys Cys 
75 80 85 

ccg tea egg tec tec atg etc acc ggg aag tat gtg cac aat cae aat 821 
Pro Ser Arg Ser Ser Met Leu Thr Gly Lys Tyr Val His Asn His Asn 
90 95 100 



gtc tac acc aac aac gag aac tgc tct tec ecc teg tgg cag gee atg 
Val Tyr Thr Asn Asn Glu Asn Cys Ser Ser Pro Ser Trp Gin Ala Met 
105 110 115 120 



869 



cat gag cet egg act ttt get gta tat ctt aac aac act ggc tac aga 917 
His Glu Pro Arg Thr Phe Ala Val Tyr Leu Asn Asn Thr Gly Tyr Arg 
125 130 135 

aca gee ttt ttt gga aaa tac etc aat gaa tat aat ggc age tac ate 965 
Thr Ala Phe Phe Gly Lys Tyr Leu Asn Glu Tyr Asn Gly Ser Tyr lie 
140 145 150 

ccc cct ggg tgg ega gaa tgg ctt gga tta ate aag aat tct cgc ttc 1013 
Pro Pro Gly Trp Arg Glu Trp Leu Gly Leu He Lys Asn Ser Arg Phe 
155 160 165 

tat aat tac act gtt tgt cgc aat ggc ate aaa gaa aag cat gga ttt 1061 
Tyr Asn Tyr Thr Val Cys Arg Asn Gly He Lys Glu Lys His Gly Phe 
170 175 180 

gat tat gca aag gac tac ttc aca gac tta ate act aac gag age att 1109 
Asp Tyr Ala Lys Asp Tyr Phe Thr Asp Leu lie Thr Asn Glu Ser He 
185 190 195 200 

aat tac ttc aaa atg tct aag aga atg tat ccc cat agg ccc gtt atg 1157 
Asn Tyr Phe Lys Met Ser Lys Arg Met Tyr Pro His Arg Pro Val Met 
205 210 215 

atg gtg ate age cac get gcg ccc cac ggc ccc gag gac tea gee cca 1205 
Met Val He Ser His Ala Ala Pro His Gly Pro Glu Asp Ser Ala Pro 
220 225 230 

cag ttt tct aaa ctg tac ccc aat get tec caa cac ata act cct agt 1253 
Gin Phe Ser Lys Leu Tyr Pro Asn Ala Ser Gin His He Thr Pro Ser 
235 240 245 

tat aac tat gca cca aat atg gat aaa cac tgg att atg cag tac aca 1301 
Tyr Asn Tyr Ala Pro Asn Met Asp Lys His Trp He Met Gin Tyr Thr 
250 255 260 

gga cca atg ctg ccc ate cac atg gaa ttt aca aac att eta cag cgc 1349 
Gly Pro Met Leu Pro He His Met Glu Phe Thr Asn He Leu Gin Arg 
265 270 275 280 

aaa agg etc cag act ttg atg tea gtg gat gat tct gtg gag agg ctg 1397 
Lys Arg Leu Gin Thr Leu Met Ser Val Asp Asp Ser Val Glu Arg Leu 
285 290 295 

tat aac atg etc gtg gag aeg ggg gag ctg gag aat act tac ate att 14 45 

Tyr Asn Met Leu Val Glu Thr Gly Glu Leu Glu Asn Thr Tyr He He 
300 305 310 

tac acc gee gac cat ggt tac cat att ggg cag ttt gga ctg gtc aag 1493 
Tyr Thr Ala Asp His Gly Tyr His He Gly Gin Phe Gly Leu Val Lys 
315 320 325 

ggg aaa tec atg cca tat gac ttt gat att cgt gtg cct ttt ttt att 1541 
Gly Lys Ser Met Pro Tyr Asp Phe Asp He Arg Val Pro Phe Phe He 
330 335 340 
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cgt ggt cca agt gta gaa cca gga tea ata gtc cca cag ate gtt etc 1589 
Arg Gly Pro Ser Val Glu Pro Gly Ser He Val Pro Gin He Val Leu 
345 350 355 360 

aac att gac ttg gcc ccc acg ate ctg gat att get ggg etc gac aca 1637 
Asn He Asp Leu Ala Pro Thr He Leu Asp He Ala Gly Leu Asp Thr 
365 370 375 

ect cct gat gtg gae ggc aag tet gtc etc aaa ett etg gac cca gaa 1685 
Pro Pro Asp Val Asp Gly Lys Ser Val Leu Lys Leu Leu Asp Pro Glu 
380 385 390 

aag cca ggt aae agg ttt cga aea aae aag aag gcc aaa att tgg cgt 1733 
Lys Pro Gly Asn Arg Phe Arg Thr Asn Lys Lys Ala Lys He Trp Arg 
395 400 405 

gat aca ttc eta gtg gaa aga ggc aaa ttt eta cgt aag aag gaa gaa 1781 
Asp Thr Phe Leu Val Glu Arg Gly Lys Phe Leu Arg Lys Lys Glu Glu 
410 415 420 

tec age aag aat ate caa cag tea aat cac ttg ccc aaa tat gaa egg 1829 
Ser Ser Lys Asn He Gin Gin Ser Asn His Leu Pro Lys Tyr Glu Arg 
425 430 435 440 

gtc aaa gaa eta tge cag cag gee agg tac cag aca gcc tgt gaa caa 1877 
Val Lys Glu Leu Cys Gin Gin Ala Arg Tyr Gin Thr Ala Cys Glu Gin 
445 450 455 



ceg ggg cag aag tgg caa tge att gag gat aea tet ggc aag ett ega 
Pro Gly Gin Lys Trp Gin Cys He Glu Asp Thr Ser Gly Lys Leu Arg 
460 465 470 



aeg egg aae etc tac get ege ggc ttc cat gac aaa gac aaa gag tge 
Thr Arg Asn Leu Tyr Ala Arg Gly Phe His Asp Lys Asp Lys Glu Cys 
490 495 500 

agt tgt agg gag tet ggt tac cgt gcc age aga age caa aga aag agt 
Ser Cys Arg Glu Ser Gly Tyr Arg Ala Ser Arg Ser Gin Arg Lys Ser 
505 510 515 520 



ttt gtc eat act egg cag aca cgt tec ttg tec gtc gaa ttt gaa ggt 
Phe Val His Thr Arg Gin Thr Arg Ser Leu Ser Val Glu Phe Glu Gly 
540 545 550 

gaa ata tat gae ata aat etg gaa gaa gaa gaa gaa ttg caa gtg ttg 
Glu He Tyr Asp He Asn Leu Glu Glu Glu Glu Glu Leu Gin Val Leu 
555 560 565 

caa cca aga aac att get aag cgt eat gat gaa ggc cac aag ggg cca 
Gin Pro Arg Asn He Ala Lys Arg His Asp Glu Gly His Lys Gly Pro 
570 575 580 

aga gat etc cag get tec agt ggt ggc aac agg ggc agg atg etg gea 
Arg Asp Leu Gin Ala Ser Ser Gly Gly Asn Arg Gly Arg Met Leu Ala 
585 590 595 600 

gat age age aac gcc gtg ggc cca cct acc act gtc cga gtg aca cac 
Asp Ser Ser Asn Ala Val Gly Pro Pro Thr Thr Val Arg Val Thr His 
605 610 615 

8 



1925 



att cac aag tgt aaa gga ccc agt gac ctg etc aca gtc egg cag age 1973 
He His Lys Cys Lys Gly Pro Ser Asp Leu Leu Thr Val Arg Gin Ser 
475 480 485 



2021 



2069 



caa egg caa ttc ttg aga aac cag ggg act cca aag tac aag ccc aga 2117 
Gin Arg Gin Phe Leu Arg Asn Gin Gly Thr Pro Lys Tyr Lys Pro Arg 
525 530 535 



2165 



2213 



2261 



2309 



2357 
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aag tgt ttt att ctt ccc aat gac tct ate cat tgt gag aga gaa ctg 2405 
Lys Cys Phe lie Leu Pro Asn Asp Ser lie His Cys Glu Arg Glu Leu 
620 625 630 

tac caa teg gcc aga gcg tgg aag gac cat aag gca tac att gac aaa 2453 
Tyr Gin Ser Ala Arg Ala Trp Lys Asp His Lys Ala Tyr He Asp Lys 
635 640 645 

gag att gaa get ctg caa gat aaa att aag aat tta aga gaa gtg aga 2501 
Glu He Glu Ala Leu Gin Asp Lys He Lys Asn Leu Arg Glu Val Arg 
650 655 660 

gga cat ctg aag aga agg aag cct gag gaa tgt age tgc agt aaa caa 254 9 

Gly His Leu Lys Arg Arg Lys Pro Glu Glu Cys Ser Cys Ser Lys Gin 
665 670 675 680 

age tat tac aat aaa gag aaa ggt gta aaa aag caa gag aaa tta aag 2597 
Ser Tyr Tyr Asn Lys Glu Lys Gly Val Lys Lys Gin Glu Lys Leu Lys 
685 690 695 

age cat ctt cac cca ttc aag gag get get cag gaa gta gat age aaa 2 645 

Ser His Leu His Pro Phe Lys Glu Ala Ala Gin Glu Val Asp Ser Lys 
700 705 710 

ctg caa ctt ttc aag gag aac aac cgt agg agg aag aag gag agg aag 2693 
Leu Gin Leu Phe Lys Glu Asn Asn Arg Arg Arg Lys Lys Glu Arg Lys 
715 720 725 

gag aag aga egg cag agg aag ggg gaa gag tgc age ctg cct ggc etc 27 41 

Glu Lys Arg Arg Gin Arg Lys Gly Glu Glu Cys Ser Leu Pro Gly Leu 
730 735 740 

act tgc ttc acg cat gac aac aac cac tgg cag aca gcc ccg ttc tgg 27 8 9 

Thr Cys Phe Thr His Asp Asn Asn His Trp Gin Thr Ala Pro Phe Trp 
745 750 755 760 

aac ctg gga tct ttc tgt get tgc acg agt tct aac aat aac acc tac 2837 
Asn Leu Gly Ser Phe Cys Ala Cys Thr Ser Ser Asn Asn Asn Thr Tyr 
765 770 775 

tgg tgt ttg cgt aca gtt aat gag acg cat aat ttt ctt ttc tgt gag 2885 
Trp Cys Leu Arg Thr Val Asn Glu Thr His Asn Phe Leu Phe Cys Glu 

780 785 790 

ttt get act ggc ttt ttg gag tat. ttt gat atg aat aca gat cct tat 2933 
Phe Ala Thr Gly Phe Leu Glu Tyr Phe Asp Met Asn Thr Asp Pro Tyr 
795 800 805 

cag etc aca aat aca gtg cac acg gta gaa cga ggc att ttg aat cag 2981 
Gin Leu Thr Asn Thr Val His Thr Val Glu Arg Gly He Leu Asn Gin 
810 815 820 

eta cac gta caa eta atg gag etc aga age tgt caa gga tat aag cag 3029 
Leu His Val Gin Leu Met Glu Leu Arg Ser Cys Gin Gly Tyr Lys Gin 
825 830 835 840 

tgc aac cca aga cct aag aat ctt gat gtt gga aat aaa gat gga gga 3077 
Cys Asn Pro Arg Pro Lys Asn Leu Asp Val Gly Asn Lys Asp Gly Gly 
845 850 855 

age tat gac eta cac aga gga cag tta tgg gat gga tgg gaa ggt taa 3125 
Ser Tyr Asp Leu His Arg Gly Gin Leu Trp Asp Gly Trp Glu Gly * 
860 865 870 

tcagcccegt cteactgcag aeatcaactg geaaggecta gaggagctac acagtgtgaa 3185 
tgaaaacatc tatgagtaca gacaaaaeta cagacttagt ctggtggaet ggaetaatta 3245 

9 
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cttgaaggat ttagatagag tatttgcact gctgaagagt cactatgagc aaaataaaac 3305 

aaataagact caaactgctc aaagtgacgg gttcttggtt gtctctgctg agcacgctgt 3365 

gtcaatggag atggcctctg ctgactcaga tgaagaccca aggcataagg ttgggaaaac 3425 

acctcatttg accttgccag ctgaccttca aaccctgcat ttgaaccgac caacattaag 34 85 

tccagagagt aaacttgaat ggaataacga cattccagaa gttaatcatt tgaattctga 3545 

acactggaga aaaaccgaaa aatggacggg gcatgaagag actaatcatc tggaaaccga 3605 

tttcagtggc gatggcatga cagagctaga gctcgggccc agccccaggc tgcagcccat 3 665 

tcgcaggcac ccgaaagaac ttccccagta tggtggtcct ggaaaggaca tttttgaaga 37 25 

tcaactatat cttcctgtgc attccgatgg aatttcagtt catcagatgt tcaccatggc 37 85 

caccgcagaa caccgaagta attccagcat agcggggaag atgttgacca aggtggagaa 3845 

gaatcacgaa aaggagaagt cacagcacct agaaggcagc gcctcctctt cactctcctc 3905 

tgattagatg aaactgttac cttaccctaa acacagtatt tctttttaac ttttttattt 39 65 

gtaaactaat aaaggkaatc acagccacca acattccaag ctaccctggg tacctttgtg 4025 

cagtagaagc tagtgagcat gtgagcaagc ggtgtgcaca cggagactca tcgttataat 4085 

ttactatctg ccaaggagta gaaagaaagg ctggggatat ttgggttggc tttggktttg 4145 

attttttgct tggttggttg gtttgkacta aaacagtatt atcttttgaa tatcgtaggg 4205 

acataarkww wwwmmwkktw wtcraawyinra kakgsywrra wkgggstyty tskkrkstmw 4265 

amwykwscmc cyskkrwwaw tywywinmywc raykytssstg rykrnktaat gaagtt 4321 



<210> 5 
<211> 569 
<212> PRT 

<213> homo sapiens 



<400> 5 

Met His Thr Leu Thr Gly Phe Ser Leu Val Ser Leu Leu Ser Phe Gly 

1 5 10 15 

Tyr Leu Ser Trp Asp Trp Ala Lys Pro Ser Phe Val Ala Asp Gly Pro 

20 25 30 

Gly Glu Ala Gly Glu Gin Pro Ser Ala Ala Pro Pro Gin Pro Pro His 

35 40 45 

lie lie Phe lie Leu Thr Asp Asp Gin Gly Tyr His Asp Val Gly Tyr 

50 55 60 

His Gly Ser Asp lie Glu Thr Pro Thr Leu Asp Arg Leu Ala Ala Lys 
65 70 75 80 

Gly Val Lys Leu Glu Asn Tyr Tyr lie Gin Pro lie Cys Thr Pro Ser 

85 90 95 

Arg Ser Gin Leu Leu Thr Gly Arg Tyr Gin lie His Thr Gly Leu Gin 

100 105 110 

His Ser lie He Arg Pro Gin Gin Pro Asn Cys Leu Pro Leu Asp Gin 

115 120 125 

Val Thr Leu Pro Gin Lys Leu Gin Glu Ala Gly Tyr Ser Thr His Met 

130 135 140 

Val Gly Lys Trp His Leu Gly Phe Tyr Arg Lys Glu Cys Leu Pro Thr 
145 150 155 160 

Arg Arg Gly Phe Asp Thr Phe Leu Gly Ser Leu Thr Gly Asn Val Asp 

165 170 175 

Tyr Tyr Thr Tyr Asp Asn Cys Asp Gly Pro Gly Val Cys Gly Phe Asp 

180 185 190 

Leu His Glu Gly Glu Asn Val Ala Trp Gly Leu Ser Gly Gin Tyr Ser 

195 200 205 

Thr Met Leu Tyr Ala Gin Arg Ala Ser His He Leu Ala Ser His Ser 

210 215 220 

Pro Gin Arg Pro Leu Phe Leu Tyr Val Ala Phe Gin Ala Val His Thr 
225 230 235 240 

Pro Leu Gin Ser Pro Arg Glu Tyr Leu Tyr Arg Tyr Arg Thr Met Gly 

245 250 255 

Asn Val Ala Arg Arg Lys Tyr Ala Ala Met Val Thr Cys Met Asp Glu 

260 265 270 

Ala Val Arg Asn He Thr Trp Ala Leu Lys Arg Tyr Gly Phe Tyr Asn 

275 280 285 

Asn Ser Val He He Phe Ser Ser Asp Asn Gly Gly Gin Thr Phe Ser 

290 295 300 

Gly Gly Ser Asn Trp Pro Leu Arg Gly Arg Lys Gly Thr Tyr Trp Glu 
305 310 315 320 

Gly Gly Val Arg Gly Leu Gly Phe Val His Ser Pro Leu Leu Lys Arg 
325 330 335 
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Lys Gin Arg Thr Ser Arg Ala Leu Met His lie Thr Asp Trp Tyr Pro 

340 345 350 

Thr Leu Val Gly Leu Ala Gly Gly Thr Thr Ser Ala Ala Asp Gly Leu 

355 360 365 

Asp Gly Tyr Asp Val Trp Pro Ala lie Ser Glu Gly Arg Ala Ser Pro 

370 375 380 

Arg Thr Glu lie Leu His Asn lie Asp Pro Leu Tyr Asn His Ala Gin 
385 390 395 400 

His Gly Ser Leu Glu Gly Gly Phe Gly lie Trp Asn Thr Ala Val Gin 

405 410 415 

Ala Ala lie Arg Val Gly Glu Trp Lys Leu Leu Thr Gly Asp Pro Gly 

420 425 430 

Tyr Gly Asp Trp lie Pro Pro Gin Thr Leu Ala Thr Phe Pro Gly Ser 

435 440 445 

Trp Trp Asn Leu Glu Arg Met Ala Ser Val Arg Gin Ala Val Trp Leu 

450 455 460 

Phe Asn He Ser Ala Asp Pro Tyr Glu Arg Glu Asp Leu Ala Gly Gin 
465 470 475 480 

Arg Pro Asp Val Val Arg Thr Leu Leu Ala Arg Leu Ala Glu Tyr Asn 

485 490 495 

Arg Thr Ala He Pro Val Arg Tyr Pro Ala Glu Asn Pro Arg Ala His 

500 505 510 

Pro Asp Phe Asn Gly Gly Ala Trp Gly Pro Trp Ala Ser Asp Glu Glu 

515 520 525 

Glu Glu Glu Glu Glu Gly Arg Ala Arg Ser Phe Ser Arg Gly Arg Arg 

530 535 540 

Lys Lys Lys Cys Lys He Cys Lys Leu Arg Ser Phe Phe Arg Lys Leu 
545 550 555 560 

Asn Thr Arg Leu Met Ser Gin Arg He 
565 

<210> 6 

<211> 2940 

<212> DNA 

<213> homo sapiens 

<220> 
<221> CDS 

<222> (334) . . . (2043) 
<400> 6 

ccacgcgtcc gcccacgcgt ccggctgcca cgccgcgtct caggctggcc gggctgagcc 60 
ggggaagagg gagcaaaggc ggcgcagggc ctgcgcttag gcagcgggag gcagctcggc 120 
gcgggcctga cctccccaga gcgccccgct gcggccgagc agatccggcc cagccgtccg 180 
gcagccagtc ccggaccaga cactggaccg tccccggggg gcgctgaact ccctcgcagc 24 0 
atccgagccg gcgggccggt ggtgcgccct gggcgcgcga ggtggtgagg ccccaggagc 300 
ccggcgcgcc gggacacgcg ggccggcttg gcg atg cac acc etc act ggc ttc 354 

Met His Thr Leu Thr Gly Phe 
1 5 

tct ctg gtc age ctg etc age ttc ggc tac ctg tec tgg gae tgg gee 402 
Ser Leu Val Ser Leu Leu Ser Phe Gly Tyr Leu Ser Trp Asp Trp Ala 
10 15 20 

aag ccg age ttc gtg gcc gac ggg ccc ggg gag get ggc gag cag ccc 4 50 

Lys Pro Ser Phe Val Ala Asp Gly Pro Gly Glu Ala Gly Glu Gin Pro 
25 30 35 

teg gcc get ccg ccc cag cct ccc cac ate ate ttc ate etc aeg gae 498 
Ser Ala Ala Pro Pro Gin Pro Pro His He He Phe He Leu Thr Asp 
40 45 50 55 

gac caa ggc tac cac gae gtg ggc tac eat ggt tea gat ate gag ace 54 6 

Asp Gin Gly Tyr His Asp Val Gly Tyr His Gly Ser Asp He Glu Thr 
60 65 7 0 

ect aeg ctg gae agg ctg gcg gcc aag ggg gtc aag ttg gag aat tat 594 

11 
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Pro Thr Leu Asp Arg Leu Ala Ala Lys Gly Val Lys Leu Glu Asn Tyr 
75 80 85 

tac ate cag ccc ate tgc acg cct teg egg age cag etc etc aet ggc 642 
Tyr He Gin Pro He Cys Thr Pro Ser Arg Ser Gin Leu Leu Thr Gly 
90 95 100 

agg tac cag ate cac aea gga etc cag cat tec ate ate ege eca cag 690 
Arg Tyr Gin He His Thr Gly Leu Gin His Ser He He Arg Pro Gin 
105 110 115 



cag ccc aac tgc ctg ccc ctg gac cag gtg aca ctg cca cag aag ctg 
Gin Pro Asn Cys Leu Pro Leu Asp Gin Val Thr Leu Pro Gin Lys Leu 
120 125 130 135 



ctg ggc teg etc acg ggc aat gtg gac tat tac ace tat gac aac tgt 
Leu Gly Ser Leu Thr Gly Asn Val Asp Tyr Tyr Thr Tyr Asp Asn Cys 
170 175 180 

gat ggc cca ggc gtg tgc ggc ttc gac ctg cac gag ggt gag aat gtg 
Asp Gly Pro Gly Val Cys Gly Phe Asp Leu His Glu Gly Glu Asn Val 
185 190 195 

gee tgg ggg etc age ggc cag tac tec act atg ctt tac gee cag ege 
Ala Trp Gly Leu Ser Gly Gin Tyr Ser Thr Met Leu Tyr Ala Gin Arg 
200 205 210 215 



gee etc aag ege tac ggt ttc tac aac aac agt gtc ate ate ttc tee 
Ala Leu Lys Arg Tyr Gly Phe Tyr Asn Asn Ser Val He He Phe Ser 
280 285 290 295 



738 



cag gag gea ggt tat tec acc cat atg gtg ggc aag tgg cac ctg ggc 786 
Gin Glu Ala Gly Tyr Ser Thr His Met Val Gly Lys Trp His Leu Gly 
140 145 150 

ttc tac egg aag gag tgt ctg ccc acc cgt egg ggc ttc gac acc ttc 834 
Phe Tyr Arg Lys Glu Cys Leu Pro Thr Arg Arg Gly Phe Asp Thr Phe 
155 160 165 



882 



930 



978 



gee age cat ate ctg gee age cac age cct cag cgt ccc etc ttc etc 1026 
Ala Ser His He Leu Ala Ser His Ser Pro Gin Arg Pro Leu Phe Leu 
220 225 230 

tat gtg gcc ttc cag gca gta cac aca ccc ctg cag tec cct cgt gag 1074 
Tyr Val Ala Phe Gin Ala Val His Thr Pro Leu Gin Ser Pro Arg Glu 
235 240 245 

tac ctg tac cgc tac cgc acc atg ggc aat gtg gcc egg egg aag tac 1122 
Tyr Leu Tyr Arg Tyr Arg Thr Met Gly Asn Val Ala Arg Arg Lys Tyr 

250 255 260 

gcg gee atg gtg acc tgc atg gat gag get gtg cgc aac ate acc tgg 1170 
Ala Ala Met Val Thr Cys Met Asp Glu Ala Val Arg Asn He Thr Trp 
265 270 275 



1218 



agt gac aat ggt ggc cag act ttc teg ggg ggc age aac tgg ccg etc 1266 
Ser Asp Asn Gly Gly Gin Thr Phe Ser Gly Gly Ser Asn Trp Pro Leu 
300 305 310 

cga gga cgc aag ggc act tat tgg gaa ggt ggc gtg egg ggc eta ggc 1314 
Arg Gly Arg Lys Gly Thr Tyr Trp Glu Gly Gly Val Arg Gly Leu Gly 
315 320 325 

ttt gtc cac agt ccc ctg etc aag cga aag caa egg aea age egg gca 1362 
Phe Val His Ser Pro Leu Leu Lys Arg Lys Gin Arg Thr Ser Arg Ala 
330 335 340 
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ctg atg cac ate act gac tgg tac ccg acc ctg gtg ggt ctg gca ggt 1410 
Leu Met His lie Thr Asp Trp Tyr Pro Thr Leu Val Gly Leu Ala Gly 
345 350 355 

ggt acc acc tea gca gcc gat ggg eta gat ggc tac gac gtg tgg ccg 14 58 

Gly Thr Thr Ser Ala Ala Asp Gly Leu Asp Gly Tyr Asp Val Trp Pro 
360 365 370 375 

gcc ate age gag ggc egg gcc tea cea egc aeg gag ate ctg cac aac 1506 
Ala He Ser Glu Gly Arg Ala Ser Pro Arg Thr Glu He Leu His Asn 
380 385 390 

att gac cca etc tac aac cat gcc cag cat ggc tec ctg gag ggc ggc 1554 
He Asp Pro Leu Tyr Asn His Ala Gin His Gly Ser Leu Glu Gly Gly 
395 400 405 

ttt ggc ate tgg aac acc gcc gtg cag get gcc ate cgc gtg ggt gag 1602 
Phe Gly He Trp Asn Thr Ala Val Gin Ala Ala He Arg Val Gly Glu 
410 415 420 

tgg aag ctg ctg aca gga gac ecc ggc tat ggc gat tgg ate cea ccg 1650 
Trp Lys Leu Leu Thr Gly Asp Pro Gly Tyr Gly Asp Trp He Pro Pro 
425 430 435 

cag aca ctg gee acc ttc ccg ggt age tgg tgg aac ctg gaa ega atg 1698 
Gin Thr Leu Ala Thr Phe Pro Gly Ser Trp Trp Asn Leu Glu Arg Met 
440 445 450 455 

gcc agt gtc cgc cag gcc gtg tgg etc ttc aac ate agt get gac cet 17 4 6 

Ala Ser Val Arg Gin Ala Val Trp Leu Phe Asn He Ser Ala Asp Pro 
460 465 470 

tat gaa egg gag gac ctg get ggc cag egg cet gat gtg gtc cgc ace 17 94 

Tyr Glu Arg Glu Asp Leu Ala Gly Gin Arg Pro Asp Val Val Arg Thr 
475 480 485 

ctg ctg get cgc ctg gcc gaa tat aac cgc aca gcc ate ccg gta cgc 1842 
Leu Leu Ala Arg Leu Ala Glu Tyr Asn Arg Thr Ala He Pro Val Arg 
490 495 500 

tac cca get gag aac ccc egg get cat cet gac ttt aat ggg ggt get 18 90 

Tyr Pro Ala Glu Asn Pro Arg Ala His Pro Asp Phe Asn Gly Gly Ala 
505 510 515 

tgg ggg ccc tgg gcc agt gat gag gaa gag gag gaa gag gaa ggg agg 1938 
Trp Gly Pro Trp Ala Ser Asp Glu Glu Glu Glu Glu Glu Glu Gly Arg . 
520 525 530 535 

get cga age ttc tec egg ggt egt cgc aag aaa aaa tgc aag att tgc 1986 
Ala Arg Ser Phe Ser Arg Gly Arg Arg Lys Lys Lys Cys Lys He Cys 
540 545 550 

aag ctt cga tec ttt ttc cgt aaa etc aac acc agg eta atg tec caa 2034 
Lys Leu Arg Ser Phe Phe Arg Lys Leu Asn Thr Arg Leu Met Ser Gin 

555 560 " 565 

egg ate tga tggtggggag ggagaaaact gtectttaga ggatcttcce 2083 
Arg He * 

cactecggct tggceetget gtttetcagg gagaagcetg teacatctec atetacaggg 2143 

agttggaggg tgtagagtec cttggttgaa eagggtaggg agcctggata ggagtgggtg 2203 

ggaataaacc agactgggat gcctgtgtct cagtcctgcc tcctcacgga ettgctctgt 2263 

gaectcaggt gaeccacatg agettttage etcagtttec tcatctgtaa aatgagctet 2323 

aatgactttg tgactctttg gtgtggccct ggagcctggg gccacggtgg agttcctggc 2383 

cggcettgee acttgacaac tectttaagg ettccccett aacaegggat cectgtggtg 244 3 

gtgtttggga gttgectgga ggcaactcca agectggcec ccagctgaag catggcaate 2503 
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2940 



tggctgctct ctacagggac ccccaagcgc tgtgggtgga gggcaggggt cgggggggtt 2563 

gaccttcttg ggtcttcaca tggcctaggc cagtcctccg gtcagactgg tgtcaggcac 2623 

cgtggtgcaa aattcctctt ctggcccctc cagtacccag agaaactggc tgggccatta 2683 

actgctgcag caccaagggt ggtagaaaga gctgtgaaga gcccccaaac cagtaccagg 27 4 3 

acacctgggt tctcctgtga cctggggcac agttcttgcc ctctaggcct tgatttcccc 2803 

acctgcaagt ggggatgcca gccctggctc tgcctccttc atgaggctct ggaagactgg 28 63 

ccaaggttgt ggaggagctt gtgaacttga ttaaagtgtc gtaacatgga aaaaaaaaaa 2 923 

aaaaaaaaaa agggcgg 

<210> 7 
<211> 599 
<212> PRT 

<213> homo sapiens 
<400> 7 

Met Ala Pro Arg Gly Cys Ala Gly His Pro Pro Pro Pro Ser Pro Gin 

15 10 15 

Ala Cys Val Cys Pro Gly Lys Met Leu Ala Met Gly Ala Leu Ala Gly 

20 25 30 

Phe Trp He Leu Cys Leu Leu Thr Tyr Gly Tyr Leu Ser Trp Gly Gin 

35 40 45 

Ala Leu Glu Glu Glu Glu Glu Gly Ala Leu Leu Ala Gin Ala Gly Glu 

50 55 60 

Lys Leu Glu Pro Ser Thr Thr Ser Thr Ser Gin Pro His Leu He Phe 
65 70 75 80 

He Leu Ala Asp Asp Gin Gly Phe Arg Asp Val Gly Tyr His Gly Ser 

85 90 95 

Glu He Lys Thr Pro Thr Leu Asp Lys Leu Ala Ala Glu Gly Val Lys 

100 105 110 

Leu Glu Asn Tyr Tyr Val Gin Pro He Cys Thr Pro Ser Arg Ser Gin 

115 120 125 

Phe He Thr Gly Lys Tyr Gin He His Thr Gly Leu Gin His Ser He 

130 135 140 

He Arg Pro Thr Gin Pro Asn Cys Leu Pro Leu Asp Asn Ala Thr Leu 
145 150 155 160 

Pro Gin Lys Leu Lys Glu Val Gly Tyr Ser Thr His Met Val Gly Lys 

165 170 175 

Trp His Leu Gly Phe Tyr Arg Lys Glu Cys Met Pro Thr Arg Arg Gly 

180 185 190 

Phe Asp Thr Phe Phe Gly Ser Leu Leu Gly Ser Gly Asp Tyr Tyr Thr 

195 200 205 

His Tyr Lys Cys Asp Ser Pro Gly Met Cys Gly Tyr Asp Leu Tyr Glu 

210 215 220 

Asn Asp Asn Ala Ala Trp Asp Tyr Asp Asn Gly He Tyr Ser Thr Gin 
225 230 235 240 

Met Tyr Thr Gin Arg Val Gin Gin He Leu Ala Ser His Asn Pro Thr 

245 250 255 

Lys Pro He Phe Leu Tyr He Ala Tyr Gin Ala Val His Ser Pro Leu 

260 265 270 

Gin Ala Pro Gly Arg Tyr Phe Glu His Tyr Arg Ser He He Asn He 

275 280 285 

Asn Arg Arg Arg Tyr Ala Ala Met Leu Ser Cys Leu Asp Glu Ala He 

290 295 300 

Asn Asn Val Thr Leu Ala Leu Lys Thr Tyr Gly Phe Tyr Asn Asn Ser 
305 310 315 320 

He He He Tyr Ser Ser Asp Asn Gly Gly Gin Pro Thr Ala Gly Gly 

325 330 335 

Ser Asn Trp Pro Leu Arg Gly Ser Lys Gly Thr Tyr Trp Glu Gly Gly 

340 345 350 

He Arg Ala Val Gly Phe Val His Ser Pro Leu Leu Lys Asn Lys Gly 

355 360 365 

Thr Val Cys Lys Glu Leu Val His He Thr Asp Trp Tyr Pro Thr Leu 

370 375 380 

He Ser Leu Ala Glu Gly Gin He Asp Glu Asp He Gin Leu Asp Gly 
385 390 395 400 

Tyr Asp He Trp Glu Thr He Ser Glu Gly Leu Arg Ser Pro Arg Val 
405 410 415 
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Asp 


He 


Leu 


His 


Asn 


He 


Asp 


Pro 


He 


Tyr 


Thr 


Lys 


Ala 


Lys 


Asn 


Gly 








420 










425 










430 






Ser 


Trp 


Ala 


Ala 


Gly 


Tyr 


Gly 


He 


Trp 


Asn 


Thr 


Ala 


He 


Gin 


Ser 


Ala 






435 










440 










445 








He 


Arg 


Val 


Gin 


His 


Trp 


Lys 


Leu 


Leu 


Thr 


Gly 


Asn 


Pro 


Gly 


Tyr 


Ser 




450 










455 










460 










Asp 


Trp 


Val 


Pro 


Pro 


Gin 


Ser 


Phe 


Ser 


Asn 


Leu 


Gly 


Pro 


Asn 


Arg 


Trp 


465 










470 










475 










480 


His 


Asn 


Glu 


Arg 


He 


Thr 


Leu 


Ser 


Thr 


Gly 


Lys 


Ser 


Val 


Trp 


Leu 


Phe 










485 










490 










495 




Asn 


He 


Thr 


Ala 


Asp 


Pro 


Tyr 


Glu 


Arg 


Val 


Asp 


Leu 


Ser 


Asn 


Arg 


Tyr 








500 










505 










510 






Pro 


Gly 


He 


Val 


Lys 


Lys 


Leu 


Leu 


Arg 


Arg 


Leu 


Ser 


Gin 


Phe 


Asn 


Lys 






515 










520 










525 








Thr 


Ala 


Val 


Pro 


Val 


Arg 


Tyr 


Pro 


Pro 


Lys 


Asp 


Pro 


Arg 


Ser 


Asn 


Pro 




530 










535 










540 










Arg 


Leu 


Asn 


Gly 


Gly 


Val 


Trp 


Gly 


Pro 


Trp 


Tyr 


Lys 


Glu 


Glu 


Thr 


Lys 


545 










550 










555 










560 


Lys 


Lys 


Lys 


Pro 


Ser 


Lys 


Asn 


Gin 


Ala 


Glu 


Lys 


Lys 


Gin 


Lys 


Lys 


Ser 










565 










570 










575 




Lys 


Lys 


Lys 


Lys 


Lys 


Lys 


Gin 


Gin 


Lys 


Ala 


Val 


Ser 


Gly 


Ser 


Thr 


Cys 








580 










585 










590 






His 


Ser 


Gly 


Val 


Thr 


Cys 


Gly 





















595 

<210> 8 

<211> 2253 

<212> DNA 

<213> homo sapiens 

<220> 
<221> CDS 

<222> (324) . . . (2123) 
<400> 8 

cacgcgtccg cccacgcgtc cgtggagata ttaacttttt tctttttttt tttccttggt 60 
ggaagctgct ctagggaggg gggaggagga ggagaaagtg aaatgtgctg gagaagagcg 120 
agccctcctt gttcttccgg agtcccatcc attaagccat cacttctgga agattaaagt 180 
tgtcggacat ggtgacagct gagaggagag gaggatttct tgccaggtgg agagtcttca 240 
ccgtctgttg ggtgcatgtg tgcgcccgca gcggcgcggg gcgcgtggtt ctccgcgtgg 300 
agtctcacct gggacctgag tga atg get ccc agg ggc tgt gcg ggg cat ccg 353 

Met Ala Pro Arg Gly Cys Ala Gly His Pro 
15 10 

cct ccg cct tct cca cag gcc tgt gtc tgt cct gga aag atg eta gca 401 
Pro Pro Pro Ser Pro Gin Ala Cys Val Cys Pro Gly Lys Met Leu Ala 
15 20 25 

atg ggg gcg ctg gca gga ttc tgg ate etc tgc etc etc act tat ggt 4 49 

Met Gly Ala Leu Ala Gly Phe Trp He Leu Cys Leu Leu Thr Tyr Gly 
30 35 40 

tac ctg tee tgg ggc cag gcc tta gaa gag gag gaa gaa ggg gcc tta 4 97 

Tyr Leu Ser Trp Gly Gin Ala Leu Glu Glu Glu Glu Glu Gly Ala Leu 
45 50 55 

eta get caa get gga gag aaa eta gag ccc age aca act tec ace tec 545 
Leu Ala Gin Ala Gly Glu Lys Leu Glu Pro Ser Thr Thr Ser Thr Ser 
60 65 70 

. cag ccc cat etc att ttc ate eta gcg gat gat cag gga ttt aga gat 593 
Gin Pro His Leu He Phe He Leu Ala Asp Asp Gin Gly Phe Arg Asp 
75 80 85 90 

gtg ggt tac cac gga tct gag att aaa aca cct act ett gae aag etc 641 
Val Gly Tyr His Gly Ser Glu He Lys Thr Pro Thr Leu Asp Lys Leu 
95 100 105 
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get gcc gaa gga gtt aaa ctg gag aac tac tat gtc cag cct att tgc 
Ala Ala Glu Gly Val Lys Leu Glu Asn Tyr Tyr Val Gin Pro lie Cys 
110 115 120 

aca cca tec agg agt cag ttt att act gga aag tat cag ata cac acc 
Thr Pro Ser Arg Ser Gin Phe lie Thr Gly Lys Tyr Gin He His Thr 
125 130 135 

gga ctt caa cat tct ate ata aga cct acc caa ccc aac tgt tta cct 
Gly Leu Gin His Ser He He Arg Pro Thr Gin Pro Asn Cys Leu Pro 
140 145 150 

ctg gac aat gcc acc eta cct cag aaa ctg aag gag gtt gga tat tea 
Leu Asp Asn Ala Thr Leu Pro Gin Lys Leu Lys Glu Val Gly Tyr Ser 
155 160 165 170 



tgc tta gat gaa gea ate aac aac gtg aca ttg get eta aag act tat 
Cys Leu Asp Glu Ala He Asn Asn Val Thr Leu Ala Leu Lys Thr Tyr 
300 305 310 



689 



737 



785 



833 



881 



aeg cat atg gtc gga aaa tgg cac ttg ggt ttt tac aga aaa gaa tgc 
Thr His Met Val Gly Lys Trp His Leu Gly Phe .Tyr Arg Lys Glu Cys 
175 180 185 

atg ccc acc aga aga gga ttt gat acc ttt ttt ggt tec ctt ttg gga 
Met Pro Thr Arg Arg Gly Phe Asp Thr Phe Phe Gly Ser Leu Leu Gly 
190 195 200 

agt ggg gat tac tat aca cac tac aaa tgt gac agt cct ggg atg tgt 
Ser Gly Asp Tyr Tyr Thr His Tyr Lys Cys Asp Ser Pro Gly Met Cys 
205 210 215 

ggc tat gac ttg tat gaa aac gac aat get gee tgg gac tat gac aat 
Gly Tyr Asp Leu Tyr Glu Asn Asp Asn Ala Ala Trp Asp Tyr Asp Asn 
220 225 230 

ggc ata tac tec aca cag atg tac act cag aga gta cag caa ate tta 
Gly He Tyr Ser Thr Gin Met Tyr Thr Gin Arg Val Gin Gin He Leu 
235 240 245 250 

get tec cat aac ccc aca aag cct ata ttt tta tat att gcc tat caa 1121 
Ala Ser His Asn Pro Thr Lys Pro He Phe Leu Tyr He Ala Tyr Gin 
; 255 260 265 

get gtt cat tea cca ctg caa get cct ggc agg tat ttc gaa cac tac 1169 
Ala Val His Ser Pro Leu Gin Ala Pro Gly Arg Tyr Phe Glu His Tyr 
270 275 280 

ega tec att ate aac ata aac agg agg aga tat get gee atg ctt tec 1217 
Arg Ser He He Asn He Asn Arg Arg Arg Tyr Ala Ala Met Leu Ser 

285 290 295 



929 



977 



1025 



1073 



1265 



ggt ttc tat aac aac age att ate att tac tct tea gat aat ggt ggc 1313 
Gly Phe Tyr Asn Asn Ser He He He Tyr Ser Ser Asp Asn Gly Gly 
315 320 325 330 

cag cct aeg gea gga ggg agt aac tgg cct etc aga ggt age aaa gga 1361 
Gin Pro Thr Ala Gly Gly Ser Asn Trp Pro Leu Arg Gly Ser Lys Gly 
335 340 345 

aca tat tgg gaa gga ggg ate egg get gta ggc ttt gtg cat age cca 1409 
Thr Tyr Trp Glu Gly Gly He Arg Ala Val Gly Phe Val His Ser Pro 
350 355 360 

ctt ctg aaa aac aag gga aca gtg tgt aag gaa ctt gtg cac ate act 1457 
Leu Leu Lys Asn Lys Gly Thr Val Cys Lys Glu Leu Val His He Thr 
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365 370 375 

gac tgg tac ccc act etc att tea ctg get gaa gga eag att gat gag 1505 
Asp Trp Tyr Pro Thr Leu He Ser Leu Ala Glu Gly Gin He Asp Glu 
380 385 390 

gac att caa eta gat ggc tat gat ate tgg gag ace ata agt gag ggt 1553 
Asp He Gin Leu Asp Gly Tyr Asp He Trp Glu Thr He Ser Glu Gly 
395 400 405 410 

ctt cgc tea ccc cga gta gat att ttg cat aac att gac ccc ata tac 1601 
Leu Arg Ser Pro Arg Val Asp He Leu His Asn He Asp Pro He Tyr 
415 420 425 

acc aag gca aaa aat ggc tec tgg gca gca ggc tat ggg ate tgg aac 164 9 

Thr Lys Ala Lys Asn Gly Ser Trp Ala Ala Gly Tyr Gly He Trp Asn 
430 435 440 

act gca ate cag tea gee ate aga gtg eag eac tgg aaa ttg ctt aca 1697 
Thr Ala He Gin Ser Ala He Arg Val Gin His Trp Lys Leu Leu Thr 
445 450 455 

gga aat eet ggc tac age gac tgg gtc ccc cet eag tet tte age aac 17 45 

Gly Asn Pro Gly Tyr Ser Asp Trp Val Pro Pro Gin Ser Phe Ser Asn 
460 465 470 

ctg gga ecg aac egg tgg eac aat gaa egg ate ace ttg tea act ggc 17 93 

Leu Gly Pro Asn Arg Trp His Asn Glu Arg He Thr Leu Ser Thr Gly 
475 480 485 490 

aaa agt gta tgg ctt tte aac ate aca gee gac eca tat gag agg gtg 1841 
Lys Ser Val Trp Leu Phe Asn He Thr Ala Asp Pro Tyr Glu Arg Val 
495 500 505 

gac eta tet aac agg tat eca gga ate gtg aag aag etc eta egg agg 188 9 

Asp Leu Ser Asn Arg Tyr Pro Gly He Val Lys Lys Leu Leu Arg Arg 
510 515 520 

etc tea cag tte aac aaa act gca gtg ceg gtc agg tat ccc ccc aaa 1937 
Leu Ser Gin Phe Asn Lys Thr Ala Val Pro Val Arg Tyr Pro Pro Lys 
525 530 535 

gac ccc aga agt aac cct agg etc aat gga ggg gtc tgg gga eca tgg ' 1985 
Asp Pro Arg Ser Asn Pro Arg Leu Asn Gly Gly Val Trp Gly Pro Trp 
540 545 550 

tat aaa gag gaa ace aag aaa aag aag eca age aaa aat eag get gag 2033 
Tyr Lys Glu Glu Thr Lys Lys Lys Lys Pro Ser Lys Asn, Gin Ala Glu 
555 560 565 570 

aaa aag caa aag .aaa age aaa aaa aag aag aag aaa cag cag aaa gca 2081 
Lys Lys Gin Lys Lys Ser Lys Lys Lys Lys Lys Lys Gin Gin Lys Ala 
575 580 585 

gtc tea ggt tea act tgc cat tea ggt gtt act tgt gga taa 2123 
Val Ser Gly Ser Thr Cys His Ser Gly Val Thr Cys Gly * 

590 595 

gcacaaatat- ttcetgtttg gttaaacttt aateagttct tatetttcat ctgtttecta 2183 
ggtaaaccag eaaatttggc tcgataatat cgctggccta agegteaggc ttgttttcat 2243 
getgtgecae 2253 

<210> 9 
<211> 552 
<212> PRT 

<213> Artificial Sequence 



17 



wo 01/55411 



PCT/US01/()3266 



<220> 

<223> Pfam consensus sequence for human sulfatase 
<400> 9 

Pro Asn He Leu Leu He Leu Ala Asp Asp Leu Gly He Gly Asp Leu 

15 10 15 

Gly Cys Tyr Gly Asn Pro Thr He Arg Thr Pro Asn He Asp Arg Leu 

20 25 30 

Ala Glu Glu Gly Leu Arg Phe Thr Asn Ala Tyr Val Thr Thr Pro Leu 

35 40 45 

Cys Thr Pro Ser Arg Ala Ala Leu Leu Thr Gly Arg Tyr Pro His Arg 

50 55 60 

Thr Gly Met Tyr Thr Asn Asn Arg Ala Gly Val Leu Pro Phe Thr Gly 
65 70 75 80 

Trp Ser Leu Glu Gly Gly Leu Pro Leu Asp Glu Thr Thr Leu Pro Glu 

85 90 95 

Leu Leu Lys Glu Ala Gly Tyr Ala Thr Gly Met Val Gly Lys Trp His 

100 105 110 

Gly Tyr Asn Glu Glu Ser Ser Ala Ser Asp Phe Ala His Leu Pro Leu 

115 120 125 

Gly Arg Gly Phe Asp Tyr Phe Tyr Gly Asn Leu Gly Gly Glu Asp Gin 

130 135 140 

Trp Tyr Pro Leu Val Asp Ala Leu Leu Pro Phe Thr Asn Asp Thr Tyr 
145 150 155 160 

Thr Cys Glu Gly Gly Tyr Gly Phe Ser Lys Asp Val Ala Leu Lys Pro 

165 170 175 

Leu Gly Ala Leu Gly Val Asn Glu Val Glu Ala Pro Asp Lys Ala Leu 

180 185 190 

Ala Asp Tyr Lys Thr Ala Gly Ala Leu Asn Val Pro His His Val Phe 

195 200 205 

Glu Trp Ala Asp Arg Tyr Ala Gly Ala Val Asp Val Gly Arg Pro Phe 

210 215 220 

Leu Ala Val Leu He Phe Pro Arg Pro Ala Ala Cys Phe Leu Tyr Pro 
225 230 235 240 

Asn Ala Thr Val Val Ser Gin Pro Met Pro His Ser Pro Leu Thr Ala 

245 250 255 

Pro Arg Pro Trp Gin Leu Leu Ala Asp Glu Ala Leu Pro Phe Leu Glu 

260 265 270 

Arg Asn Gly Gin Arg Asp Lys Pro Phe Phe Leu Tyr Leu Ser Tyr Lys 

275 280 285 

His Val His He Pro Arg Asp Ala Pro Met Leu Phe Ser Ser Lys Asp 

290 295 300 

Phe Ala Gly Ser Ser Arg Arg Gly Leu Tyr Gly Leu He Leu Asp Ser 
305 310 315 320 

Val Glu Glu Met Asp Asp Gly Val Gly Arg Val Leu Asn Ala Leu Asp 

325 330 335 

Glu Leu Asn Gly Leu Leu Asp Asn Thr Leu He He Phe Thr Ser Leu 

340 345 350 

Leu Asp His Gly Gly His Leu Gly Ala His Gly His Leu Gly He Arg 

355 360 365 

Ala Gly Gly Ser Asn Gly Pro Phe Arg Gly Gly Lys Gly Thr Asn Leu 

370 375 380 

Tyr Glu Gly Gly Thr Arg Val Pro Leu He Val Arg Trp Pro Glu Gly 
385 390 395 400 

He He Ala Pro Gly Gin Val Ser Asp Glu Leu Val Ser Leu Met Asp 

405 410 415 

Leu Phe Pro Thr He Leu Asp Leu Ala Gly Ala Pro Leu Pro Gly Val 

420 425 430 

Ala Ala Gly Val Lys Asp Arg He Leu Asp Gly Val Ser Leu Leu Pro 

435 440 445 

Leu Leu Leu Gly Ala Ala Gly Ser Ser Arg His Glu Thr Leu Phe Tyr 

450 455 460 

Glu Ser Tyr Cys Asn Glu Gly Arg Gly Phe Leu Pro Ala Val Arg Trp 
465 470 475 480 

Gly Lys Lys Lys Ala His Phe Arq Thr Pro Asn He Ala Gly Trp Gin 

485 490 495 

Arg Val Asp Phe Asp Asp Val Trp Lys Leu Phe Asn Thr Val Glu Asp 
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500 505 510 

Phe Asn Arg Ser Gly Asp Asp Ala Cys Arg His Gly Asp Val Cys Lys 

515 520 525 

Cys Leu Gly Lys Pro Arg Arg Ser Val Thr His His Asp Pro Pro Leu 

530 535 540 

Leu Tyr Asp Leu Ser Arg Asp Pro 
545 550 

<210> 10 
<211> 520 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Pfam consensus sequence for human sulfatase 

<400> 10 

Pro Asn Val Leu Leu lie Leu Ala Asp Asp Leu Gly lie Gly Asp Leu 

15 10 15 

Gly Cys Tyr Gly His Pro Thr lie Arg Thr Pro Asn Leu Asp Arg Leu 

20 25 30 

Ala Glu Glu Gly Leu Arg Phe Thr Asn His Tyr Thr Ala Thr Pro Leu 

35 40 45 

Cys Ser Pro Ser Arg Ala Ala Leu Leu Thr Gly Arg Tyr Pro His Arg 

50 55 60 

His Gly Met Val Ser Asn Gly Arg Leu Gly Val Leu Gly Phe Thr Ala 
65 70 75 80 

Lys Ser Gly Gly Leu Pro Leu Asp Glu Thr Thr Leu Pro Glu Leu Leu 

85 90 95 

Lys Glu Ala Gly Tyr Ala Thr Gly Leu Val Gly Lys Trp His Leu Gly 

100 105 110 

Leu Asn Glu Asn Ser Asp Ala Ala Gly Asp Gly Glu His Leu Pro Leu 

115 120 125 

Gly Trp Arg Gly Phe Asp Tyr Phe Asp Gly Phe Leu Tyr Gly Ser Pro 

130 135 140 

Phe Thr Tyr Asp Glu Glu Asn Cys Asp Asn Gly Glu Gly Thr Glu Pro 
145 150 155 160 

Pro Glu Ala Tyr Pro Glu Gin Gly Trp Leu Pro Gin lie Leu Gly Tyr 

165 170 175 

Tyr Leu Thr Asp Leu Leu Ala Asp Lys Ala Leu Gly Leu Leu Asp Val 

180 185 190 

Ala Ser Ala Ala Gly Arg Leu Leu Ala Lys Ala Leu Ala Ala Ser Arg 

195 200 205 

Pro Phe Phe Leu Tyr lie Ser Pro Pro Ala Pro His Phe Ser lie Leu 

210 215 220 

Phe Arg Asn Phe Lys Glu Val Ala Gin Pro Tyr Arg Ala Pro Gin Leu 
225 230 235 240 

Thr Gin Leu Phe Val Asp Glu Ala Ala Asp Phe lie Glu Arg Asn Lys 

245 250 255 

Glu Lys Pro Phe Phe Leu Tyr Leu Ala Phe Leu Arg Leu His Val His 

260 265 270 

Thr Pro Leu Phe Ser Pro Ala Glu Asp Leu Glu Ser Lys Asp Phe Leu 

275 280 285 

Gly Arg Ser Gin Arg Gly Arg Tyr Gly Asp Leu Val Glu Glu Met Asp 

290 295 300 

Asp Leu Val Gly Arg Val Leu Asp Ala Leu Glu Asp Leu Gly Leu Leu 
305 310 315 320 

Asp Asn Thr Leu Val lie Phe Thr Ser Asp Asn Gly Ala His Leu Glu 

325 330 335 

Gly Thr Pro Glu Trp Tyr Gly Gly Gly Asn Gly Pro Leu Lys Gly Gly 

340 345 350 

Lys Gly Tyr Gly Ser Leu Tyr Glu Gly Gly lie Arg Val Pro Leu Leu 

355 360 365 

Val Arg Trp Pro Gly Gly He Ala Pro Ala Gly Arg Val Lys Glu Lys 

370 375 380 

Ser Glu Leu Val Ser His Val Asp Leu Ala Pro Thr He Leu Asp Leu 
385 390 395 400 
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Ala Gly Ala Pro 

Leu Asp Gly Val 
420 

Ser Arg Arg Ala 
435 

Lys Leu Arg Ala 
450 

Leu Lys Ala His 
465 

Gly Trp Glu Cys 

Cys Arg Cys Glu 
500 

Leu Tyr Asp Leu 

515 



Leu Pro Lys Val 
405 

Ser Leu Leu Pro 

His Glu Thr Leu 
440 

Val Arg Trp Pro 
455 

Phe Phe Thr Pro 
470 

Val Gly Thr Val 
485 

Gly Val Glu Thr 

Ser Arg Asp Pro 
520 



Ala Asn Gly Ala 
410 

Leu Leu Leu Gly 
425 

Phe His Tyr Asn 

Arg Lys Ser Gly 
4 60 

Ala Phe Asp Asp 
475 

Ser Gin Ala Asp 
490 

Val Thr His His 
505 



Lys Asp Arg Pro 
415 

Gly Ala Ala Pro 
430 

Gly Lys Gly Arg 
445 

Lys Thr Pro Lys 

Asp Thr Asn Asn 
480 

Asp He Glu Asp 
4 95 

Asp Pro Pro Glu 
510 



<210> 11 

<211> 1578 

<212> DNA 

<213> homo sapiens 



<400> 11 

atgggctggc tttttctaaa ggttttgttg gcgggagtga gtttctcagg atttctttat 
cctcttgtgg atttttgcat cagtgggaaa acaagaggac agaagccaaa ctttgtgatt 
attttggccg atgacatggg gtggggtgac ctgggagcaa actgggcaga aacaaaggac 180 
actgccaacc ttgataagat ggcttcggag ggaatgaggt ttgtggattt ccatgcagct 240 
gcctccacct gctcaccctc ccgggcttcc ttgctcaccg gccggcttgg ccttcgcaat 300 
ggagtcacac gcaactttgc agtcacttct gtgggaggcc ttccgctcaa cgagaccacc 360 
ttggcagagg tgctgcagca ggcgggttac gtcactggga taataggcaa atggcatctt 420 
ggacaccacg gctcttatca ccccaacttc cgtggttttg attactactt tggaatccca 
tatagccatg atatgggctg tactgatact ccaggctaca accaccctcc ttgtccagcg 
tgtccacagg gtgatggacc atcaaggaac cttcaaagag actgttacac tgacgtggcc 
ctccctcttt atgaaaacct caacattgtg gagcagccgg tgaacttgag cagccttgcc 
cagaagtatg ctgagaaagc aacccagttc atccagcgtg caagcaccag cgggaggccc 720 

..... ^ 

840 
900 
960 
1020 



60 
120 



480 

540 
600 
660 



ttcctgctct atgtggctct ggcccacatg cacgtgccct tacccgtgac tcagctacca 
gcagcgccac ggggcagaag cctgtatggt gcagggctct gggagatgga cagtctggtg 
ggccagatca aggacaaagt tgaccacaca gtgaaggaaa acacattcct ctggtttaca 
ggagacaatg gcccgtgggc tcagaagtgt gagctagcgg gcagtgtggg tcccttcact 
ggattttggc aaactcgtca agggggaagt ccagccaagc agacgacctg ggaaggaggg 

caccgggtcc cagcactggc ttactggcct ggcagagttc cagttaatgt caccagcact 1080 

gccttgttaa gcgtgctgga catttttcca actgtggtag ccctggccca ggccagctta 1140 

cctcaaggac ggcgctttga tggtgtggac gtctccgagg tgctctttgg ccggtcacag 1200 

cctgggcaca gggtgctgtt ccaccccaac agcggggcag ctggagagtt tggagccctg 1260 

cagactgtcc gcctggagcg ttacaaggcc ttctacatta ccggtggagc cagggcgtgt 1320 

gatgggagca cggggcctga gctgcagcat aagtttcctc tgattttcaa cctggaagac 1380 

gataccgcag aagctgtgcc cctagaaaga ggtggtgcgg agtaccaggc tgtgctgccc 14 40 

gaggtcagaa aggttcttgc agacgtcctc caagacattg ccaacgacaa catctccagc 1500 

gcagattaca ctcaggaccc ttcagtaact ccctgctgta atccctacca aattgcctgc 1560 

cgctgtcaag ccgcataa 1578 



<210> 12 

<211> 2616 

<212> DNA 

<213> homo sapiens 



<400> 12 

atgaagtatt cttgctgtgc tctggttttg gctgtcctgg gcacagaatt gctgggaagc 60 

ctctgttcga ctgtcagatc cccgaggttc agaggacgga tacagcagga acgaaaaaac 120 

atccgaccca acattattct tgtgcttacc gatgatcaag atgtggagct ggggtccctg 180 

caagtcatga acaaaacgag aaagattatg gaacatgggg gggccacctt catcaatgcc 240 

tttgtgacta cacccatgtg ctgcccgtca cggtcctcca tgctcaccgg gaagtatgtg 300 

cacaatcaca atgtctacac caacaacgag aactgctctt ccccctcgtg gcaggccatg 360 

catgagcctc ggacttttgc tgtatatctt aacaacactg gctacagaac agcctttttt 420 

ggaaaatacc tcaatgaata taatggcagc tacatccccc ctgggtggcg agaatggctt 4 80 

ggattaatca agaattctcg cttctataat tacactgttt gtcgcaatgg catcaaagaa 540 

aagcatggat ttgattatgc aaaggactac ttcacagact taatcactaa cgagagcatt 600 
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aattacttca aaatgtctaa gagaatgtat ccccataggc ccgttatgat ggtgatcagc 660 

cacgctgcgc cccacggccc cgaggactca gccccacagt tttctaaact gtaccccaat 720 

gcttcccaac acataactcc tagttataac "tatgcaccaa atatggataa acactggatt 780 

atgcagtaca caggaccaat gctgcccatc cacatggaat ttacaaacat tctacagcgc 840 

aaaaggctcc agactttgat gtcagtggat gattctgtgg agaggctgta taacatgctc 900 

gtggagacgg gggagctgga gaatacttac atcatttaca ccgccgacca tggttaccat 960 

attgggcagt ttggactggt caaggggaaa tccatgccat atgactttga tattcgtgtg 1020 

ccttttttta ttcgtggtcc aagtgtagaa ccaggatcaa tagtcccaca gatcgttctc 1080 

aacattgact tggcccccac gatcctggat attgctgggc tcgacacacc tcctgatgtg 1140 

gacggcaagt ctgtcctcaa acttctggac ccagaaaagc caggtaacag gtttcgaaca 1200 

aacaagaagg ccaaaatttg gcgtgataca ttcctagtgg aaagaggcaa atttctacgt 1260 

aagaaggaag aatccagcaa gaatatccaa cagtcaaatc acttgcccaa atatgaacgg 1320 

gtcaaagaac tatgccagca ggccaggtac cagacagcct gtgaacaacc ggggcagaag 1380 

tggcaatgca ttgaggatac atctggcaag cttcgaattc acaagtgtaa aggacccagt 1440 

gacctgctca cagtccggca gagcacgcgg aacctctacg ctcgcggctt ccatgacaaa 1500 

gacaaagagt gcagttgtag ggagtctggt taccgtgcca gcagaagcca aagaaagagt 1560 

caacggcaat tcttgagaaa ccaggggact ccaaagtaca agcccagatt tgtccatact 1620 

cggcagacac gttccttgtc cgtcgaattt gaaggtgaaa tatatgacat aaatctggaa 1680 

gaagaagaag aattgcaagt gttgcaacca agaaacattg ctaagcgtca tgatgaaggc 1740 

cacaaggggc caagagatct ccaggcttcc agtggtggca acaggggcag gatgctggca 1800 

gatagcagca acgccgtggg cccacctacc actgtccgag tgacacacaa gtgttttatt 1860 

cttcccaatg actctatcca ttgtgagaga gaactgtacc aatcggccag agcgtggaag 1920 

gaccataagg catacattga caaagagatt gaagctctgc aagataaaat taagaattta 1980 

agagaagtga gaggacatct gaagagaagg aagcctgagg aatgtagctg cagtaaacaa 2040 

agctattaca ataaagagaa aggtgtaaaa aagcaagaga aattaaagag ccatcttcac 2100 

ccattcaagg aggctgctca ggaagtagat agcaaactgc aacttttcaa ggagaacaac 2160 

cgtaggagga agaaggagag gaaggagaag agacggcaga ggaaggggga agagtgcagc 2220 

ctgcctggcc tcacttgctt cacgcatgac aacaaccact ggcagacagc cccgttctgg 2280 

aacctgggat ctttctgtgc ttgcacgagt tctaacaata acacctactg gtgtttgcgt 2340 

acagttaatg agacgcataa ttttcttttc tgtgagtttg ctactggctt tttggagtat 24 00 

tttgatatga atacagatcc ttatcagctc acaaatacag tgcacacggt agaacgaggc 24 60 

attttgaatc agctacacgt acaactaatg gagctcagaa gctgtcaagg atataagcag 2520 

tgcaacccaa gacctaagaa tcttgatgtt ggaaataaag atggaggaag ctatgaccta 2580 

cacagaggac agttatggga tggatgggaa ggttaa 2616 

<210> 13 

<211> 1710 

<212> DNA 

<213> homo sapiens 

<400> 13 

atgcacaccc tcactggctt ctctctggtc agcctgctca gcttcggcta cctgtcctgg 60 

gactgggcca agccgagctt cgtggccgac gggcccgggg aggctggcga gcagccctcg 120 

gccgctccgc cccagcctcc ccacatcatc ttcatcctca cggacgacca aggctaccac 180 

gacgtgggct accatggttc agatatcgag acccctacgc tggacaggct ggcggccaag 240 

ggggtcaagt tggagaatta ttacatccag cccatctgca cgccttcgcg gagccagctc 300 

ctcactggca ggtaccagat ccacacagga ctccagcatt ccatcatccg cccacagcag 3 60 

cccaactgcc tgcccctgga ccaggtgaca ctgccacaga agctgcagga ggcaggttat 42 0 

tccacccata tggtgggcaa gtggcacctg ggcttctacc ggaaggagtg tctgcccacc 480 

cgtcggggct tcgacacctt cctgggctcg ctcacgggca atgtggacta ttacacctat 540 

gacaactgtg atggcccagg cgtgtgcggc ttcgacctgc acgagggtga gaatgtggcc 600 

tgggggctca gcggccagta ctccactatg ctttacgccc agcgcgccag ccatatcctg 660 

gccagccaca gccctcagcg tcccctcttc ctctatgtgg ccttccaggc agtacacaca 720 

cccctgcagt cccctcgtga gtacctgtac cgctaccgca ccatgggcaa tgtggcccgg 780 

cggaagtacg cggccatggt gacctgcatg gatgaggctg tgcgcaacat cacctgggcc 840 

ctcaagcgct acggtttcta caacaacagt gtcatcatct tctccagtga caatggtggc 900 

cagactttct cggggggcag caactggccg ctccgaggac gcaagggcac ttattgggaa 960 

ggtggcgtgc ggggcctagg ctttgtccac agtcccctgc tcaagcgaaa gcaacggaca 1020 

agccgggcac tgatgcacat cactgactgg tacccgaccc tggtgggtct ggcaggtggt 1080 

accacctcag cagccgatgg gctagatggc tacgacgtgt ggccggccat cagcgagggc 1140 

cgggcctcac cacgcacgga gatcctgcac aacattgacc cactctacaa ccatgcccag 1200 

catggctccc tggagggcgg ctttggcatc tggaacaccg ccgtgcaggc tgccatccgc 1260 

gtgggtgagt ggaagctgct gacaggagac cccggctatg gcgattggat cccaccgcag 1320 

acactggcca ccttcccggg tagctggtgg aacctggaac gaatggccag tgtccgccag 1380 

gccgtgtggc tcttcaacat cagtgctgac ccttatgaac gggaggacct ggctggccag 1440 

cggcctgatg tggtccgcac cctgctggct cgcctggccg aatataaccg cacagccatc 1500 

ccggtacgct acccagctga gaacccccgg gctcatcctg actttaatgg gggtgcttgg 1560 
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gggccctggg ccagtgatga ggaagaggag gaagaggaag ggagggctcg aagcttctcc 1620 

cggggtcgtc gcaagaaaaa atgcaagatt tgcaagcttc gatccttttt ccgtaaactc 1680 

aacaccaggc taatgtccca acggatctga 1710 

<210> 14 

<211> 1800 

<212> DNA 

<213> homo sapiens 

<400> 14 

atggctccca ggggctgtgc ggggcatccg cctccgcctt ctccacaggc ctgtgtctgt 60 

cctggaaaga tgctagcaat gggggcgctg gcaggattct ggatcctctg cctcctcact 120 

tatggttacc tgtcctgggg ccaggcctta gaagaggagg aagaaggggc cttactagct 180 

caagctggag agaaactaga gcccagcaca acttccacct cccagcccca tctcattttc 240 

atcctagcgg atgatcaggg atttagagat gtgggttacc acggatctga gattaaaaca 300 

cctactcttg acaagctcgc tgccgaagga gttaaactgg agaactacta tgtccagcct 360 

atttgcacac catccaggag tcagtttatt actggaaagt atcagataca caccggactt 420 

caacattcta tcataagacc tacccaaccc aactgtttac ctctggacaa tgccacccta 4 80 

cctcagaaac tgaaggaggt tggatattca acgcatatgg tcggaaaatg gcacttgggt 540 

ttttacagaa aagaatgcat gcccaccaga agaggatttg ataccttttt tggttccctt 600 

ttgggaagtg gggattacta tacacactac aaatgtgaca gtcctgggat gtgtggctat 660 

gacttgtatg aaaacgacaa tgctgcctgg gactatgaca atggcatata ctccacacag 720 

atgtacactc agagagtaca gcaaatctta gcttcccata accccacaaa gcctatattt 780 

ttatatattg cctatcaagc tgttcattca ccactgcaag ctcctggcag gtatttcgaa 840 

cactaccgat ccattatcaa cataaacagg aggagatatg ctgccatgct ttcctgctta 900 

gatgaagcaa tcaacaacgt gacattggct ctaaagactt atggtttcta taacaacagc 960 

attatcattt actcttcaga taatggtggc cagcctacgg caggagggag taactggcct 1020 

ctcagaggta gcaaaggaac atattgggaa ggagggatcc gggctgtagg ctttgtgcat 1080 

agcccacttc tgaaaaacaa gggaacagtg tgtaaggaac ttgtgcacat cactgactgg 1140 

taccccactc tcatttcact ggctgaagga cagattgatg aggacattca actagatggc 1200 

tatgatatct gggagaccat aagtgagggt cttcgctcac cccgagtaga tattttgcat 1260 

aacattgacc ccatatacac caaggcaaaa aatggctcct gggcagcagg ctatgggatc 1320 

tggaacactg caatccagtc agccatcaga gtgcagcact ggaaattgct tacaggaaat 1380 

cctggctaca gcgactgggt cccccctcag tctttcagca acctgggacc gaaccggtgg 14 4 0 

cacaatgaac ggatcacctt gtcaactggc aaaagtgtat ggcttttcaa catcacagcc 1500 

gacccatatg agagggtgga cctatctaac aggtatccag gaatcgtgaa gaagctccta 1560 

cggaggctct cacagttcaa caaaactgca gtgccggtca ggtatccccc caaagacccc 1620 

agaagtaacc ctaggctcaa tggaggggtc tggggaccat ggtataaaga ggaaaccaag 1680 

aaaaagaagc caagcaaaaa tcaggctgag aaaaagcaaa agaaaagcaa aaaaaagaag 17 40 

aagaaacagc agaaagcagt ctcaggttca acttgccatt caggtgttac ttgtggataa 1800 
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an extlnt lliat no meaningly International Search can be earned out. specfcally: 

see FURTHER INFORMATION sheet PCT/ISA/210 



D biSTu'se mey are dependent claims and are not drafted in accordance with the second and third sentences of Rule 6.4(a). 



Box II Observations where unity of invention is lacking (Continuation of item 2 of f irstsheet)^ 

This international Searching Authority found multiple inventions in this international application, as follows: 



see additional sheet 



1 .Q AS all required.additional search fees were timely paid by the applicant, this International Search Report covers aU 



searchable claims. 
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I Ik, -a .Hrfiiional search fees were timely paid by the applicant. Consequentiy. this International Search Report is 

:i°S1o S,e nlioS mlntS in th'e'claims': it is covered by claims Nos.: 
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j j The additional search fees were accompanied by the applicant's protest. 

j 1 No protest accompanied the payment of additional search fees. 



Form PCT/ISA/210 (continuation of first sheet (1 )) (July 1998) 



BNSDOCID: <WO_015541 1A3J_> 



International Application No. PCT/US 01/03266 

FURTHER INFORMATION CONTINUED FROM PCT/ISA/ 210 



page 2 of 2 

BNSDOCID: <WO 015S41 1A3_I_> 



International Application No. PCT/US 01/03266 



FURTHER tNFORMATI ON CONTINUED FROM PCT/ISA/ 210 

This International Searching Authority found multiple (groups of) 
inventions in this international application, as follows: 

1. Claims: 1-26 (all partly) 

An isolated nucleic acid molecule which comprises a 
nucleotide sequence which is at least 60% identical to SEQ 
ID N0:2, or comprises a fragment of at least 20 nucleotides 
of SEQ- ID NO: 2 or 11, or encodes a polypeptide with SEQ ID 
NO-1 a fragment or an allelic variant thereof, its 
complementing strand, host cells containing this sequence 
the encoded polypeptide and antibodies binding thereto, as 
well as a various diagnostic and therapeutic applications 
thereof. 

2. Claims: 1-26 (all partly) 

An isolated nucleic acid molecule which comprises a 
nucleotide sequence which is at least 60% identical to SEQ 
Id |Il0:4. or comprises a fragment of at least 20 nucleotides 
of SEQ ID NO: 4 or 12, or encodes a polypeptide with SEQ ID 
NO* 3 a fragment or an allelic variant thereof, its 
complementing strand, host cells containing this sequence 
the encoded polypeptide and antibodies binding thereto, as 
well as a various diagnostic and therapeutic applications 
I thereof. 

3. Claims: 1-26 (all partly) 

An isolated nucleic acid molecule which comprises a 
nucleotide sequence which is at least 60% identical to SEQ 
ID m:l, or comprises a fragment of at least 20 9"^^?°^^?" 
of SEQ ID NO: 6 or 13, or encodes a polypeptide with SEQ ip 
NO- 5 a fragment or an allelic variant thereof, its 
complementing strand, host cells containing this sequence, 
the encoded polypeptide and antibodies binding thereto, as 
well as a various diagnostic and therapeutic applications 
thereof. 

4. Claims: 1-26 (all partly) 

An isolated nucleic acid molecule which comprises a 
nucleotide sequence which is at least 60% identical to SEQ 
Yd lll0:8, or comprises a fragment of at least 20 nucleotides 
of SEQ ID N0:8 or 14, or encodes a polypeptide with SEQ ID 
NO- 7 a fragment or an allelic variant thereof, its 
comDlementing strand, host cells containing this sequence, 
the encoded polypeptide and antibodies binding thereto, as 
I well as a various diagnostic and therapeutic applications 

thereof . 
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FURTHER INFORMATION CONTINUED FROM PCT/ISA/ 210 



Continuation of Box I .2 

Claims Nos.: 1, 2, 8, 9, 12. 16. 19 and 22 

1. The ATCC deposit numbers of the plasmids referred to in claims 1. 2, 
8, 9 and 12 are not provided and a search concerning this subject-matter 
was therefore not possible- 

2. Claim 16 and 22 refer to a compound which binds to a polypeptide of 
claim 8 without giving a true technical characterisation. Moreover, no 
such compounds are defined in the application, and the search was 
therefore limited to antibodies that bind the polypeptide with SEQ ID 
N0:1. 

3. Claim 19 concerns a kit containing a compound which selectively 
hybridizes to a nucleic acid molecule of claim 1. The search was limited 
to nucleotide sequences that are complementary to SEQ ID NO: 2 or 11. 

The applicant's attention is drawn to the fact that claims, or parts of 
claims, relating to inventions in respect of which no international 
search report has been established need not be the subject of an 
international preliminary examination (Rule 65.1(e) PCT) . The applicant 
is advised that the EPO policy when acting as an International 
Preliminary Examining Authority is normally not to carry out a 
preliminary examination on matter which has not been searched- This is 
the case irrespective of whether or not the claims are amended following 
receipt of the search report or during any Chapter II procedure. 
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